The Embedding layer

Recurrent Neural Networks (RNNs) for Language Modeling with Keras

David Cecchini

Data Scientist

Why embeddings

Advantages:

  • Reduce the dimension
    one_hot = np.array((N, 100000))
    embedd = np.array((N, 300))
    
  • Dense representation
    • king - man + woman = queen
  • Transfer learning

Disadvantages:

  • Lots of parameters to train: training takes longer
Recurrent Neural Networks (RNNs) for Language Modeling with Keras

How to use in keras

In keras:

from tensorflow.keras.layers import Embedding

model = Sequential() # Use as the first layer model.add(Embedding(input_dim=100000,
output_dim=300,
trainable=True,
embeddings_initializer=None,
input_length=120))
Recurrent Neural Networks (RNNs) for Language Modeling with Keras

Transfer learning

Transfer learning for language models

  • GloVE
  • word2vec
  • BERT

In keras:

from tensorflow.keras.initializers import Constant

model.add(Embedding(input_dim=vocabulary_size, output_dim=embedding_dim,
embeddings_initializer=Constant(pre_trained_vectors))
Recurrent Neural Networks (RNNs) for Language Modeling with Keras

Using GloVE pre-trained vectors

Official site: https://nlp.stanford.edu/projects/glove/

# Get hte GloVE vectors
def get_glove_vectors(filename="glove.6B.300d.txt"):
    # Get all word vectors from pre-trained model
    glove_vector_dict = {}
    with open(filename) as f:
        for line in f:

values = line.split()
word = values[0] coefs = values[1:]
glove_vector_dict[word] = np.asarray(coefs, dtype='float32')
return glove_vector_dict
Recurrent Neural Networks (RNNs) for Language Modeling with Keras

Using the GloVE on a specific task

# Filter GloVE vectors to specific task
def filter_glove(vocabulary_dict, glove_dict, wordvec_dim=300):

# Create a matrix to store the vectors embedding_matrix = np.zeros((len(vocabulary_dict) + 1, wordvec_dim))
for word, i in vocabulary_dict.items(): embedding_vector = glove_dict.get(word)
if embedding_vector is not None: # words not found in the glove_dict will be all-zeros. embedding_matrix[i] = embedding_vector
return embedding_matrix
Recurrent Neural Networks (RNNs) for Language Modeling with Keras

Let's practice!

Recurrent Neural Networks (RNNs) for Language Modeling with Keras

Preparing Video For Download...