Natural Language Processing with spaCy
Azadeh Mobasher
Principal Data Scientist
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Here's my spaCy pipeline.")
spaCy
spacy.load()
to return nlp
, a Language
classLanguage
object is the text processing pipelinenlp()
on any text to get a Doc
container
spaCy
applies some processing steps using its Language
class:
spaCy
:
Name | Description |
---|---|
Doc |
A container for accessing linguistic annotations of text |
Span |
A slice from a Doc object |
Token |
An individual token, i.e. a word, punctuation, whitespace, etc. |
spaCy
language processing pipeline always depends on the loaded model and its capabilities.
Component | Name | Description |
---|---|---|
Tokenizer | Tokenizer | Segment text into tokens and create Doc object |
Tagger | Tagger | Assign part-of-speech tags |
Lemmatizer | Lemmatizer | Reduce the words to their root forms |
EntityRecognizer | NER | Detect and label named entities |
Each component has unique features to process text
import spacy nlp = spacy.load("en_core_web_sm") doc = nlp("Tokenization splits a sentence into its tokens.")
print([token.text for token in doc])
['Tokenization', 'splits', 'a', 'sentence', 'into', 'its', 'tokens', '.']
DependencyParser
componentimport spacy nlp = spacy.load("en_core_web_sm") text = "We are learning NLP. This course introduces spaCy." doc = nlp(text)
for sent in doc.sents: print(sent.text)
We are learning NLP.
This course introduces spaCy.
import spacy nlp = spacy.load("en_core_web_sm") doc = nlp("We are seeing her after one year.")
print([(token.text, token.lemma_) for token in doc])
[('We', 'we'), ('are', 'be'), ('seeing', 'see'), ('her', 'she'),
('after', 'after'), ('one', 'one'), ('year', 'year'), ('.', '.')]
Natural Language Processing with spaCy