Introduction to word vectors

Natural Language Processing with spaCy

Azadeh Mobasher

Principal Data Scientist

Word vectors (embeddings)

 

  • Numerical representations of words
  • Bag of words method: {"I": 1, "got": 2, ...}

 

  • Older methods do not allow to understand the meaning:
Sentences    I       got      covid   coronavirus
I got covid    1     2     3
I got coronavirus    1     2        4
Natural Language Processing with spaCy

Word vectors

  • A pre-defined number of dimensions
  • Considers word frequencies and the presence of other words in similar contexts

Word vectors examples

Natural Language Processing with spaCy

Word vectors

  • Multiple approaches to produce word vectors:

    • word2vec, Glove, fastText and transformer-based architectures
  • An example of a word vector:

Example word vector

Natural Language Processing with spaCy

spaCy vocabulary

 

  • A part of many spaCy models.
  • en_core_web_md has 300-dimensional vectors for 20,000 words.

 

import spacy
nlp = spacy.load("en_core_web_md")
print(nlp.meta["vectors"])
>>> {'width': 300, 'vectors': 20000, 'keys': 514157, 
'name': 'en_vectors', 'mode': 'default'}
Natural Language Processing with spaCy

Word vectors in spaCy

  • nlp.vocab: to access vocabulary (Vocab class)
  • nlp.vocab.strings: to access word IDs in a vocabulary
import  spacy
nlp = spacy.load("en_core_web_md")
like_id = nlp.vocab.strings["like"]
print(like_id)
>>> 18194338103975822726
  • .vocab.vectors: to access words vectors of a model or a word, given its corresponding ID
print(nlp.vocab.vectors[like_id])
>>> array([-2.3334e+00, -1.3695e+00, -1.1330e+00, -6.8461e-01, ...])
Natural Language Processing with spaCy

Let's practice!

Natural Language Processing with spaCy

Preparing Video For Download...