Data pre-processing

Recurrent Neural Networks (RNNs) for Language Modeling with Keras

David Cecchini

Data Scientist

Text classification

Applications of text classification:

  • Automatic news classification
  • Document classification for businesses
  • Queue segmentation for customer support
  • Many more!
Recurrent Neural Networks (RNNs) for Language Modeling with Keras

Changes from binary classification

What change from binary to multi class:

  • Shape of the output variable y
  • Number of units on the output layer
  • Activation function on the output layer
  • Loss function
Recurrent Neural Networks (RNNs) for Language Modeling with Keras

Changes from binary classification

Shape of the output variable y:

  • One-hot encoding of the classes
# Example: num_classes = 3
y[0] = [0, 1, 0]
y.shape = (N, num_classes)

Number of units on the output layer:

# Output layer
model.add(Dense(num_classes))
Recurrent Neural Networks (RNNs) for Language Modeling with Keras

Changes from binary classification

Difference between numbers in a line and in the space, showcasing the application of one-hot encoding

Recurrent Neural Networks (RNNs) for Language Modeling with Keras

Changes from binary classification

Activation function on the output layer:

  • softmax gives the probability of every class
# Output layer
model.add(Dense(num_classes, activation="softmax"))

Loss function:

  • Instead of binary, we use categorical cross-entropy
# Compile the model
model.compile(loss='categorical_crossentropy')
Recurrent Neural Networks (RNNs) for Language Modeling with Keras

Preparing text categories for keras

y = ["sports", "economy", "data_science", "sports", "finance"]
# Transform to pandas series object
y_series = pd.Series(y, dtype="category")

# Print the category codes print(y_series.cat.codes)
0    3
1    1
2    0
3    3
4    2
Recurrent Neural Networks (RNNs) for Language Modeling with Keras

Pre-processing y

from tensorflow.keras.utils import to_categorical

y = np.array([0, 1, 2]) # Change to categorical y_prep = to_categorical(y) print(y_prep)
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
Recurrent Neural Networks (RNNs) for Language Modeling with Keras

Let's practice!

Recurrent Neural Networks (RNNs) for Language Modeling with Keras

Preparing Video For Download...