Sentiment classification revisited

Recurrent Neural Networks (RNNs) for Language Modeling with Keras

David Cecchini

Data Scientist

Previous results

We had bad results with our initial model.

model = Sequential()
model.add(SimpleRNN(units=16, input_shape=(None, 1)))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='sgd', metrics=['accuracy'])

model.evaluate(x_test, y_test)

$[0.6991182165145874, 0.495]

Improving the model

To improve the model's performance, we can:

Add the embedding layer
Increase the number of layers
Tune the parameters
Increase vocabulary size
Accept longer sentences with more memory cells

Avoiding overfitting

RNN models can overfit

Test different batch sizes.
Add Dropout layers.
Add dropout and recurrent_dropout parameters on RNN layers.

# removes 20% of input to add noise
model.add(Dropout(rate=0.2))

# Removes 10% of input and memory cells respectively
model.add(LSTM(128, dropout=0.1, recurrent_dropout=0.1))

Extra: Convolution Layer

Not in the scope:

model.add(Embedding(vocabulary_size, wordvec_dim, ...))
model.add(Conv1D(num_filters=32, kernel_size=3, padding='same'))
model.add(MaxPooling1D(pool_size=2))

Convolution layer do feature selection on the embedding vector
Achieves state-of-the-art results in many NLP problems

One example model

model = Sequential()
model.add(Embedding(  vocabulary_size, wordvec_dim, trainable=True,
                      embeddings_initializer=Constant(glove_matrix), 
                      input_length=max_text_len, name="Embedding"))

model.add(Dense(wordvec_dim, activation='relu', name="Dense1"))

model.add(Dropout(rate=0.25))
model.add(LSTM(64, return_sequences=True, dropout=0.15, name="LSTM"))

model.add(GRU(64, return_sequences=False, dropout=0.15, name="GRU"))

model.add(Dense(64, name="Dense2"))
model.add(Dropout(rate=0.25))
model.add(Dense(32, name="Dense3"))

model.add(Dense(1, activation='sigmoid', name="Output"))

Let's practice!

Recurrent Neural Networks (RNNs) for Language Modeling with Keras