Spoken Language Processing in Python
Daniel Bourke
Machine Learning Engineer/YouTube Creator
# Installeer spaCy
$ pip install spacy
# Download spaCy-taalmodel
$ python -m spacy download en_core_web_sm
import spacy# Laad een spaCy-taalmodel nlp = spacy.load("en_core_web_sm")
# Maak een spaCy-doc
doc = nlp("I'd like to talk about a smartphone I ordered on July 31st from your
Sydney store, my order number is 40939440. I spoke to Georgia about it last week.")
# Toon tokens en posities
for token in doc:
print(token.text, token.idx)
I 0
'd 1
like 4
to 9
talk 12
about 17
a 23
smartphone 25...
# Toon zinnen in doc
for sentences in doc.sents:
print(sentence)
I'd like to talk about a smartphone I ordered on July 31st from your Sydney store,
my order number is 4093829.
I spoke to one of your customer service team, Georgia, yesterday.
Enkele ingebouwde entiteittypes in spaCy:
# Vind benoemde entiteiten in doc
for entity in doc.ents:
print(entity.text, entity.label_)
July 31st DATE
Sydney GPE
4093829 CARDINAL
one CARDINAL
Georgia GPE
yesterday DATE
# Importeer EntityRuler-klasse
from spacy.pipeline import EntityRuler
# Check de spaCy-pijplijn
print(nlp.pipeline)
[('tagger', <spacy.pipeline.pipes.Tagger at 0x1c3aa8a470>),
('parser', <spacy.pipeline.pipes.DependencyParser at 0x1c3bb60588>),
('ner', <spacy.pipeline.pipes.EntityRecognizer at 0x1c3bb605e8>)]
# Maak een EntityRuler-instance
ruler = EntityRuler(nlp)
# Voeg tokenpatroon toe aan ruler
ruler.add_patterns([{"label":"PRODUCT", "pattern": "smartphone"}])
# Voeg nieuwe regel toe aan pijplijn vóór ner
nlp.add_pipe(ruler, before="ner")
# Check de bijgewerkte pijplijn
nlp.pipeline
[('tagger', <spacy.pipeline.pipes.Tagger at 0x1c1f9c9b38>),
('parser', <spacy.pipeline.pipes.DependencyParser at 0x1c3c9cba08>),
('entity_ruler', <spacy.pipeline.entityruler.EntityRuler at 0x1c1d834b70>),
('ner', <spacy.pipeline.pipes.EntityRecognizer at 0x1c3c9cba68>)]
# Test de nieuwe entiteitregel
for entity in doc.ents:
print(entity.text, entity.label_)
smartphone PRODUCT
July 31st DATE
Sydney GPE
4093829 CARDINAL
one CARDINAL
Georgia GPE
yesterday DATE
Spoken Language Processing in Python