Natural Language Processing with spaCy
Azadeh Mobasher
Principal Data Scientist
import spacynlp = spacy.load("en_core_web_sm")doc = nlp("Here's my spaCy pipeline.")
spaCyspacy.load() to return nlp, a Language classLanguage object is the text processing pipelinenlp() on any text to get a Doc container
spaCy applies some processing steps using its Language class:
spaCy:
| Name | Description |
|---|---|
Doc |
A container for accessing linguistic annotations of text |
Span |
A slice from a Doc object |
Token |
An individual token, i.e. a word, punctuation, whitespace, etc. |
spaCy language processing pipeline always depends on the loaded model and its capabilities.
| Component | Name | Description |
|---|---|---|
| Tokenizer | Tokenizer | Segment text into tokens and create Doc object |
| Tagger | Tagger | Assign part-of-speech tags |
| Lemmatizer | Lemmatizer | Reduce the words to their root forms |
| EntityRecognizer | NER | Detect and label named entities |
Each component has unique features to process text
import spacy nlp = spacy.load("en_core_web_sm") doc = nlp("Tokenization splits a sentence into its tokens.")print([token.text for token in doc])
['Tokenization', 'splits', 'a', 'sentence', 'into', 'its', 'tokens', '.']
DependencyParser componentimport spacy nlp = spacy.load("en_core_web_sm") text = "We are learning NLP. This course introduces spaCy." doc = nlp(text)for sent in doc.sents: print(sent.text)
We are learning NLP.
This course introduces spaCy.
import spacy nlp = spacy.load("en_core_web_sm") doc = nlp("We are seeing her after one year.")print([(token.text, token.lemma_) for token in doc])
[('We', 'we'), ('are', 'be'), ('seeing', 'see'), ('her', 'she'),
('after', 'after'), ('one', 'one'), ('year', 'year'), ('.', '.')]
Natural Language Processing with spaCy