Named entity recognition on transcribed text

Spoken Language Processing in Python

Daniel Bourke

Machine Learning Engineer/YouTube Creator

Installing spaCy

# Install spaCy
$ pip install spacy
# Download spaCy language model
$ python -m spacy download en_core_web_sm
Spoken Language Processing in Python

Using spaCy

import spacy

# Load spaCy language model nlp = spacy.load("en_core_web_sm")
# Create a spaCy doc
doc = nlp("I'd like to talk about a smartphone I ordered on July 31st from your 
Sydney store, my order number is 40939440. I spoke to Georgia about it last week.")
Spoken Language Processing in Python

spaCy tokens

# Show different tokens and positions
for token in doc:
  print(token.text, token.idx)
I 0
'd 1
like 4
to 9
talk 12
about 17
a 23
smartphone 25...
Spoken Language Processing in Python

spaCy sentences

# Show sentences in doc
for sentences in doc.sents:
  print(sentence)
I'd like to talk about a smartphone I ordered on July 31st from your Sydney store, 
my order number is 4093829.
I spoke to one of your customer service team, Georgia, yesterday.
Spoken Language Processing in Python

spaCy named entities

Some of spaCy's built-in named entities:

  • PERSON People, including fictional.
  • ORG Companies, agencies, institutions, etc.
  • GPE Countries, cities, states.
  • PRODUCT Objects, vehicles, foods, etc. (Not services.)
  • DATE Absolute or relative dates or periods.
  • TIME Times smaller than a day.
  • MONEY Monetary values, including unit.
  • CARDINAL Numerals that do not fall under another type.
Spoken Language Processing in Python

spaCy named entities

# Find named entities in doc
for entity in doc.ents:
  print(entity.text, entity.label_)
July 31st DATE
Sydney GPE
4093829 CARDINAL
one CARDINAL
Georgia GPE
yesterday DATE
Spoken Language Processing in Python

Custom named entities

# Import EntityRuler class
from spacy.pipeline import EntityRuler
# Check spaCy pipeline
print(nlp.pipeline)
[('tagger', <spacy.pipeline.pipes.Tagger at 0x1c3aa8a470>),
 ('parser', <spacy.pipeline.pipes.DependencyParser at 0x1c3bb60588>),
 ('ner', <spacy.pipeline.pipes.EntityRecognizer at 0x1c3bb605e8>)]
Spoken Language Processing in Python

Changing the pipeline

# Create EntityRuler instance
ruler = EntityRuler(nlp)
# Add token pattern to ruler
ruler.add_patterns([{"label":"PRODUCT", "pattern": "smartphone"}])
# Add new rule to pipeline before ner
nlp.add_pipe(ruler, before="ner")
# Check updated pipeline
nlp.pipeline
Spoken Language Processing in Python

Changing the pipeline

[('tagger', <spacy.pipeline.pipes.Tagger at 0x1c1f9c9b38>),
 ('parser', <spacy.pipeline.pipes.DependencyParser at 0x1c3c9cba08>),
 ('entity_ruler', <spacy.pipeline.entityruler.EntityRuler at 0x1c1d834b70>),
 ('ner', <spacy.pipeline.pipes.EntityRecognizer at 0x1c3c9cba68>)]
Spoken Language Processing in Python

Testing the new pipeline

# Test new entity rule
for entity in doc.ents:
    print(entity.text, entity.label_)
smartphone PRODUCT
July 31st DATE
Sydney GPE
4093829 CARDINAL
one CARDINAL
Georgia GPE
yesterday DATE
Spoken Language Processing in Python

Let's rocket and practice spaCy!

Spoken Language Processing in Python

Preparing Video For Download...