Transfer learning for language models

Recurrent Neural Networks (RNNs) for Language Modeling with Keras

David Cecchini

Data Scientist

The idea behind transfer learning

Transfer learning:

Start with better than random initial weights
Use models trained on very big datasets
"Open-source" data science models

Available architectures

Base example: I really loved this movie

Word2Vec
- Continuous Bag of Words (CBOW) X = [I, really, this, movie], y = loved
- Skip-gram X = loved, y = [I, really, this, movie]
FastText X = [I, rea, eal, all, lly, really, ...], y = loved
- Uses words and n-grams of chars
ELMo X = [I, really, loved, this], y = movie
- Uses words, embeddings per context
- Uses Deep bidirectional language models (biLM)
Word2Vec and FastText are available on package gensim and ELMo on tensorflow_hub

Example using Word2Vec

from gensim.models import word2vec

# Train the model
w2v_model = word2vec.Word2Vec(tokenized_corpus, size=embedding_dim, 
                              window=neighbor_words_num, iter=100)

# Get top 3 similar words to "captain"
w2v_model.wv.most_similar(["captain"], topn=3)

[('sweatpants', 0.7249663472175598),
('kirk', 0.7083336114883423),
('larry', 0.6495886445045471)]

Example using FastText

from gensim.models import fasttext

# Instantiate the model
ft_model = fasttext.FastText(size=embedding_dim, window=neighbor_words_num)

# Build vocabulary
ft_model.build_vocab(sentences=tokenized_corpus)

# Train the model
ft_model.train(sentences=tokenized_corpus, 
               total_examples=len(tokenized_corpus), 
               epochs=100)

Let's practice!

Recurrent Neural Networks (RNNs) for Language Modeling with Keras