Spoken Language Processing in Python
Daniel Bourke
Machine Learning Engineer/YouTube Creator
# Install spaCy
$ pip install spacy
# Download spaCy language model
$ python -m spacy download en_core_web_sm
import spacy
# Load spaCy language model nlp = spacy.load("en_core_web_sm")
# Create a spaCy doc
doc = nlp("I'd like to talk about a smartphone I ordered on July 31st from your
Sydney store, my order number is 40939440. I spoke to Georgia about it last week.")
# Show different tokens and positions
for token in doc:
print(token.text, token.idx)
I 0
'd 1
like 4
to 9
talk 12
about 17
a 23
smartphone 25...
# Show sentences in doc
for sentences in doc.sents:
print(sentence)
I'd like to talk about a smartphone I ordered on July 31st from your Sydney store,
my order number is 4093829.
I spoke to one of your customer service team, Georgia, yesterday.
Some of spaCy's built-in named entities:
# Find named entities in doc
for entity in doc.ents:
print(entity.text, entity.label_)
July 31st DATE
Sydney GPE
4093829 CARDINAL
one CARDINAL
Georgia GPE
yesterday DATE
# Import EntityRuler class
from spacy.pipeline import EntityRuler
# Check spaCy pipeline
print(nlp.pipeline)
[('tagger', <spacy.pipeline.pipes.Tagger at 0x1c3aa8a470>),
('parser', <spacy.pipeline.pipes.DependencyParser at 0x1c3bb60588>),
('ner', <spacy.pipeline.pipes.EntityRecognizer at 0x1c3bb605e8>)]
# Create EntityRuler instance
ruler = EntityRuler(nlp)
# Add token pattern to ruler
ruler.add_patterns([{"label":"PRODUCT", "pattern": "smartphone"}])
# Add new rule to pipeline before ner
nlp.add_pipe(ruler, before="ner")
# Check updated pipeline
nlp.pipeline
[('tagger', <spacy.pipeline.pipes.Tagger at 0x1c1f9c9b38>),
('parser', <spacy.pipeline.pipes.DependencyParser at 0x1c3c9cba08>),
('entity_ruler', <spacy.pipeline.entityruler.EntityRuler at 0x1c1d834b70>),
('ner', <spacy.pipeline.pipes.EntityRecognizer at 0x1c3c9cba68>)]
# Test new entity rule
for entity in doc.ents:
print(entity.text, entity.label_)
smartphone PRODUCT
July 31st DATE
Sydney GPE
4093829 CARDINAL
one CARDINAL
Georgia GPE
yesterday DATE
Spoken Language Processing in Python