Recurrent Neural Networks (RNNs) for Language Modeling with Keras
David Cecchini
Data Scientist
Advantages:
one_hot = np.array((N, 100000))
embedd = np.array((N, 300))
king - man + woman = queen
Disadvantages:
In keras:
from tensorflow.keras.layers import Embedding
model = Sequential() # Use as the first layer model.add(Embedding(input_dim=100000,
output_dim=300,
trainable=True,
embeddings_initializer=None,
input_length=120))
Transfer learning for language models
In keras:
from tensorflow.keras.initializers import Constant
model.add(Embedding(input_dim=vocabulary_size, output_dim=embedding_dim,
embeddings_initializer=Constant(pre_trained_vectors))
Official site: https://nlp.stanford.edu/projects/glove/
# Get hte GloVE vectors def get_glove_vectors(filename="glove.6B.300d.txt"): # Get all word vectors from pre-trained model glove_vector_dict = {} with open(filename) as f: for line in f:
values = line.split()
word = values[0] coefs = values[1:]
glove_vector_dict[word] = np.asarray(coefs, dtype='float32')
return glove_vector_dict
# Filter GloVE vectors to specific task def filter_glove(vocabulary_dict, glove_dict, wordvec_dim=300):
# Create a matrix to store the vectors embedding_matrix = np.zeros((len(vocabulary_dict) + 1, wordvec_dim))
for word, i in vocabulary_dict.items(): embedding_vector = glove_dict.get(word)
if embedding_vector is not None: # words not found in the glove_dict will be all-zeros. embedding_matrix[i] = embedding_vector
return embedding_matrix
Recurrent Neural Networks (RNNs) for Language Modeling with Keras