Natural Language Processing with spaCy
Azadeh Mobasher
Principal Data Scientist
| POS | Description | Example |
|---|---|---|
| VERB | Verb | run, eat, ate, take |
| NOUN | Noun | man, airplane, tree, flower |
| ADJ | Adjective | big, old, incompatible, conflicting |
| ADV | Adverb | very, down, there, tomorrow |
| CONJ | Conjunction | and, or, but |
spaCy captures POS tags in the pos_ feature of the nlp pipelinespacy.explain() explains a given POS tagverb_sent = "I watch TV."print([(token.text, token.pos_, spacy.explain(token.pos_)) for token in nlp(verb_sent)])
[('I', 'PRON', 'pronoun'),
('watch', 'VERB', 'verb'),
('TV', 'NOUN', 'noun'),
('.', 'PUNCT', 'punctuation')]
noun_sent = "I left without my watch."print([(token.text, token.pos_, spacy.explain(token.pos_)) for token in nlp(noun_sent)])
[('I', 'PRON', 'pronoun'),
('left', 'VERB', 'verb'),
('without', 'ADP', 'adposition'),
('my', 'PRON', 'pronoun'),
('watch', 'NOUN', 'noun'),
('.', 'PUNCT', 'punctuation')]
| Entity type | Description |
|---|---|
| PERSON | Named person or family |
| ORG | Companies, institutions, etc. |
| GPE | Geo-political entity, countries, cities, etc. |
| LOC | Non-GPE locations, mountain ranges, etc. |
| DATE | Absolute or relative dates or periods |
| TIME | Time smaller than a day |
spaCy models extract named entities using the NER pipeline componentdoc.ents propertyspaCy will also tag each entity with its entity label (.label_)
import spacy nlp = spacy.load("en_core_web_sm") text = "Albert Einstein was genius." doc = nlp(text)print([(ent.text, ent.start_char, ent.end_char, ent.label_) for ent in doc.ents])
>>> [('Albert Einstein', 0, 15, 'PERSON')]
Doc container
import spacy nlp = spacy.load("en_core_web_sm") text = "Albert Einstein was genius." doc = nlp(text)print([(token.text, token.ent_type_) for token in doc])
>>> [('Albert', 'PERSON'), ('Einstein', 'PERSON'),
('was', ''), ('genius', ''), ('.', '')]
spaCy is equipped with a modern visualizer: displaCydisplaCy entity visualizer highlights named entities and their labelsimport spacy from spacy import displacy text = "Albert Einstein was genius." nlp = spacy.load("en_core_web_sm") doc = nlp(text)displacy.serve(doc, style="ent")
Natural Language Processing with spaCy