Word vectors and spaCy

Natural Language Processing with spaCy

Azadeh Mobasher

Principal Data Scientist

Word vectors visualization

  • Word vectors allow to understand how words are grouped

 

Example word vector

  • Principal Component Analysis projects word vectors into a two-dimensional space

 

Visualizing word vectors

Natural Language Processing with spaCy

Word vectors visualization

  • Import required libraries and a spaCy model.
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
import numpy as np

nlp = spacy.load("en_core_web_md")
  • Extract word vectors for a given list of words and stack them vertically.
words = ["wonderful", "horrible", 
           "apple", "banana", "orange", "watermelon", 
           "dog", "cat"]
word_vectors = np.vstack([nlp.vocab.vectors[nlp.vocab.strings[w]] for w in words])
Natural Language Processing with spaCy

Word vectors visualizations

  • Extract two principal components using PCA.
pca = PCA(n_components=2)
word_vectors_transformed = pca.fit_transform(word_vectors)

 

  • Visualize the scatter plot of transformed vectors.
plt.figure(figsize=(10, 8))
plt.scatter(word_vectors_transformed[:, 0], word_vectors_transformed[:, 1])

for word, coord in zip(words, word_vectors_transformed): x, y = coord plt.text(x, y, word, size=10) plt.show()
Natural Language Processing with spaCy

Analogies and vector operations

  • A semantic relationship between a pair of words.
  • Word embeddings generate analogies such as gender and tense:
    • queen - woman + man = king

Analogies and vector operations

Natural Language Processing with spaCy

Similar words in a vocabulary

  • spaCy find semantically similar terms to a given term
import numpy as np
import spacy
nlp = spacy.load("en_core_web_md")

word = "covid"

most_similar_words = nlp.vocab.vectors.most_similar( np.asarray([nlp.vocab.vectors[nlp.vocab.strings[word]]]), n=5) words = [nlp.vocab.strings[w] for w in most_similar_words[0][0]] print(words)
>>> ['Covi', 'CoVid', 'Covici', 'COVID-19', 'corona']
Natural Language Processing with spaCy

Let's practice!

Natural Language Processing with spaCy

Preparing Video For Download...