Natural Language Processing with spaCy
Azadeh Mobasher
Principal Data Scientist
POS | Description | Example |
---|---|---|
VERB | Verb | run, eat, ate, take |
NOUN | Noun | man, airplane, tree, flower |
ADJ | Adjective | big, old, incompatible, conflicting |
ADV | Adverb | very, down, there, tomorrow |
CONJ | Conjunction | and, or, but |
spaCy
captures POS tags in the pos_
feature of the nlp pipelinespacy.explain()
explains a given POS tagverb_sent = "I watch TV."
print([(token.text, token.pos_, spacy.explain(token.pos_)) for token in nlp(verb_sent)])
[('I', 'PRON', 'pronoun'),
('watch', 'VERB', 'verb'),
('TV', 'NOUN', 'noun'),
('.', 'PUNCT', 'punctuation')]
noun_sent = "I left without my watch."
print([(token.text, token.pos_, spacy.explain(token.pos_)) for token in nlp(noun_sent)])
[('I', 'PRON', 'pronoun'),
('left', 'VERB', 'verb'),
('without', 'ADP', 'adposition'),
('my', 'PRON', 'pronoun'),
('watch', 'NOUN', 'noun'),
('.', 'PUNCT', 'punctuation')]
Entity type | Description |
---|---|
PERSON | Named person or family |
ORG | Companies, institutions, etc. |
GPE | Geo-political entity, countries, cities, etc. |
LOC | Non-GPE locations, mountain ranges, etc. |
DATE | Absolute or relative dates or periods |
TIME | Time smaller than a day |
spaCy
models extract named entities using the NER
pipeline componentdoc.ents
propertyspaCy
will also tag each entity with its entity label (.label_
)
import spacy nlp = spacy.load("en_core_web_sm") text = "Albert Einstein was genius." doc = nlp(text)
print([(ent.text, ent.start_char, ent.end_char, ent.label_) for ent in doc.ents])
>>> [('Albert Einstein', 0, 15, 'PERSON')]
Doc
container
import spacy nlp = spacy.load("en_core_web_sm") text = "Albert Einstein was genius." doc = nlp(text)
print([(token.text, token.ent_type_) for token in doc])
>>> [('Albert', 'PERSON'), ('Einstein', 'PERSON'),
('was', ''), ('genius', ''), ('.', '')]
spaCy
is equipped with a modern visualizer: displaCy
displaCy
entity visualizer highlights named entities and their labelsimport spacy from spacy import displacy text = "Albert Einstein was genius." nlp = spacy.load("en_core_web_sm") doc = nlp(text)
displacy.serve(doc, style="ent")
Natural Language Processing with spaCy