Linguistic features in spaCy

Natural Language Processing with spaCy

Azadeh Mobasher

Principal Data Scientist

POS tagging

  • Categorizing words grammatically, based on function and context within a sentence
POS Description Example
VERB Verb run, eat, ate, take
NOUN Noun man, airplane, tree, flower
ADJ Adjective big, old, incompatible, conflicting
ADV Adverb very, down, there, tomorrow
CONJ Conjunction and, or, but
Natural Language Processing with spaCy

POS tagging with spaCy

 

  • POS tagging confirms the meaning of a word
  • Some words such as watch can be both noun and verb
  • spaCy captures POS tags in the pos_ feature of the nlp pipeline
  • spacy.explain() explains a given POS tag

POS Tagger component

Natural Language Processing with spaCy

POS tagging with spaCy

verb_sent = "I watch TV."

print([(token.text, token.pos_, spacy.explain(token.pos_)) for token in nlp(verb_sent)])
[('I', 'PRON', 'pronoun'), 
('watch', 'VERB', 'verb'), 
('TV', 'NOUN', 'noun'), 
('.', 'PUNCT', 'punctuation')]

noun_sent = "I left without my watch."

print([(token.text, token.pos_, spacy.explain(token.pos_)) for token in nlp(noun_sent)])
[('I', 'PRON', 'pronoun'), 
('left', 'VERB', 'verb'), 
('without', 'ADP', 'adposition'), 
('my', 'PRON', 'pronoun'),
('watch', 'NOUN', 'noun'),
('.', 'PUNCT', 'punctuation')]
Natural Language Processing with spaCy

Named entity recognition

  • A named entity is a word or phrase that refers to a specific entity with a name
  • Named-entity recognition (NER) classifies named entities into pre-defined categories
Entity type Description
PERSON Named person or family
ORG Companies, institutions, etc.
GPE Geo-political entity, countries, cities, etc.
LOC Non-GPE locations, mountain ranges, etc.
DATE Absolute or relative dates or periods
TIME Time smaller than a day
Natural Language Processing with spaCy

NER and spaCy

 

  • spaCy models extract named entities using the NER pipeline component
  • Named entities are available via the doc.ents property
  • spaCy will also tag each entity with its entity label (.label_)

NER component

Natural Language Processing with spaCy

NER and spaCy

 

import spacy
nlp = spacy.load("en_core_web_sm")
text = "Albert Einstein was genius."
doc = nlp(text)

print([(ent.text, ent.start_char, ent.end_char, ent.label_) for ent in doc.ents])
>>> [('Albert Einstein', 0, 15, 'PERSON')]
Natural Language Processing with spaCy

NER and spaCy

  • We can also access entity types of each token in a Doc container

 

import spacy
nlp = spacy.load("en_core_web_sm")
text = "Albert Einstein was genius."
doc = nlp(text)

print([(token.text, token.ent_type_) for token in doc])
>>> [('Albert', 'PERSON'), ('Einstein', 'PERSON'),
('was', ''), ('genius', ''), ('.', '')]
Natural Language Processing with spaCy

displaCy

 

  • spaCy is equipped with a modern visualizer: displaCy
  • The displaCy entity visualizer highlights named entities and their labels
import spacy
from spacy import displacy

text = "Albert Einstein was genius."
nlp = spacy.load("en_core_web_sm")
doc = nlp(text)

displacy.serve(doc, style="ent")

displaCy NER output

Natural Language Processing with spaCy

Let's practice!

Natural Language Processing with spaCy

Preparing Video For Download...