Advanced NLP with spaCy
Ines Montani
spaCy core developer
spaCy
can compare two objects and predict similarityDoc.similarity()
, Span.similarity()
and Token.similarity()
0
to 1
)en_core_web_md
(medium model)en_core_web_lg
(large model)en_core_web_sm
(small model)# Load a larger model with vectors nlp = spacy.load('en_core_web_md')
# Compare two documents doc1 = nlp("I like fast food") doc2 = nlp("I like pizza") print(doc1.similarity(doc2))
0.8627204117787385
# Compare two tokens
doc = nlp("I like pizza and pasta")
token1 = doc[2]
token2 = doc[4]
print(token1.similarity(token2))
0.7369546
# Compare a document with a token
doc = nlp("I like pizza")
token = nlp("soap")[0]
print(doc.similarity(token))
0.32531983166759537
# Compare a span with a document
span = nlp("I like pizza and pasta")[2:5]
doc = nlp("McDonalds sells burgers")
print(span.similarity(doc))
0.619909235817623
Doc
and Span
vectors default to average of token vectors# Load a larger model with vectors nlp = spacy.load('en_core_web_md')
doc = nlp("I have a banana") # Access the vector via the token.vector attribute print(doc[3].vector)
[2.02280000e-01, -7.66180009e-02, 3.70319992e-01,
3.28450017e-02, -4.19569999e-01, 7.20689967e-02,
-3.74760002e-01, 5.74599989e-02, -1.24009997e-02,
5.29489994e-01, -5.23800015e-01, -1.97710007e-01,
-3.41470003e-01, 5.33169985e-01, -2.53309999e-02,
1.73800007e-01, 1.67720005e-01, 8.39839995e-01,
5.51070012e-02, 1.05470002e-01, 3.78719985e-01,
2.42750004e-01, 1.47449998e-02, 5.59509993e-01,
1.25210002e-01, -6.75960004e-01, 3.58420014e-01,
-4.00279984e-02, 9.59490016e-02, -5.06900012e-01,
-8.53179991e-02, 1.79800004e-01, 3.38669986e-01,
...
doc1 = nlp("I like cats")
doc2 = nlp("I hate cats")
print(doc1.similarity(doc2))
0.9501447503553421
Advanced NLP with spaCy