Natural Language Processing with spaCy
Azadeh Mobasher
Principal Data Scientist
What is the cheapest flight from Boston to Seattle?
Which airline serves Denver, Pittsburgh and Atlanta?
What kinds of planes are used by American Airlines?
spaCy
calculates similarity scores between Token objectsnlp = spacy.load("en_core_web_md") doc1 = nlp("We eat pizza") doc2 = nlp("We like to eat pasta")
token1 = doc1[2] token2 = doc2[4] print(f"Similarity between {token1} and {token2} = ", round(token1.similarity(token2), 3))
>>> Similarity between pizza and pasta = 0.685
spaCy
calculates semantic similarity of two given Span
objectsdoc1 = nlp("We eat pizza") doc2 = nlp("We like to eat pasta") span1 = doc1[1:] span2 = doc2[1:]
print(f"Similarity between \"{span1}\" and \"{span2}\" = ", round(span1.similarity(span2), 3))
>>> Similarity between "eat pizza" and "like to eat pasta" = 0.588
print(f"Similarity between \"{doc1[1:]}\" and \"{doc2[3:]}\" = ",
round(doc1[1:].similarity(doc2[3:]), 3))
>>> Similarity between "eat pizza" and "eat pasta" = 0.936
spaCy
calculates the similarity scores between two documentsnlp = spacy.load("en_core_web_md")
doc1 = nlp("I like to play basketball")
doc2 = nlp("I love to play basketball")
print("Similarity score :", round(doc1.similarity(doc2), 3))
>>> Similarity score : 0.975
Doc
vectors default to an average of word vectorsspaCy
finds relevant content to a given keywordsentences = nlp("What is the cheapest flight from Boston to Seattle? Which airline serves Denver, Pittsburgh and Atlanta? What kinds of planes are used by American Airlines?") keyword = nlp("price")
for i, sentence in enumerate(sentences.sents): print(f"Similarity score with sentence {i+1}: ", round(sentence.similarity(keyword), 5))
>>> Similarity score with sentence 1: 0.26136
Similarity score with sentence 2: 0.14021
Similarity score with sentence 3: 0.13885
Natural Language Processing with spaCy