spaCy EntityRuler

Natural Language Processing with spaCy

Azadeh Mobasher

Principal Data Scientist

spaCy EntityRuler

 

  • EntityRuler adds named-entities to a Doc container
  • It can be used on its own or combined with EntityRecognizer
  • Phrase entity patterns for exact string matches (string):
{"label": "ORG", "pattern": "Microsoft"}
  • Token entity patterns with one dictionary describing one token (list):
{"label": "GPE", "pattern": [{"LOWER": "san"}, {"LOWER": "francisco"}]}
Natural Language Processing with spaCy

Adding EntityRuler to spaCy pipeline

 

  • Using .add_pipe() method
  • List of patterns can be added using .add_patterns() method

 

nlp = spacy.blank("en")
entity_ruler = nlp.add_pipe("entity_ruler")
patterns = [{"label": "ORG", "pattern": "Microsoft"},
            {"label": "GPE", "pattern": [{"LOWER": "san"}, {"LOWER": "francisco"}]}]
entity_ruler.add_patterns(patterns)
Natural Language Processing with spaCy

Adding EntityRuler to spaCy pipeline

 

  • .ents store the results of an EntityLinker component

 

doc = nlp("Microsoft is hiring software developer in San Francisco.")
print([(ent.text, ent.label_) for ent in doc.ents])
[('Microsoft', 'ORG'), ('San Francisco', 'GPE')]
Natural Language Processing with spaCy

EntityRuler in action

 

  • Integrates with spaCy pipeline components
  • Enhances the named-entity recognizer

  • spaCy model without EntityRuler:

nlp = spacy.load("en_core_web_sm")

doc = nlp("Manhattan associates is a company in the U.S.")
print([(ent.text, ent.label_) for ent in doc.ents])
>>> [('Manhattan', 'GPE'), ('U.S.', 'GPE')]
Natural Language Processing with spaCy

EntityRuler in action

 

  • EntityRuler added after existing ner component:
nlp = spacy.load("en_core_web_sm")
ruler = nlp.add_pipe("entity_ruler", after='ner')
patterns = [{"label": "ORG", "pattern": [{"lower": "manhattan"}, {"lower": "associates"}]}]
ruler.add_patterns(patterns)

doc = nlp("Manhattan associates is a company in the U.S.")
print([(ent.text, ent.label_) for ent in doc.ents])
>>> [('Manhattan', 'GPE'), ('U.S.', 'GPE')]
Natural Language Processing with spaCy

EntityRuler in action

 

  • EntityRuler added before existing ner component:
nlp = spacy.load("en_core_web_sm")
ruler = nlp.add_pipe("entity_ruler", before='ner')
patterns = [{"label": "ORG", "pattern": [{"lower": "manhattan"}, {"lower": "associates"}]}]
ruler.add_patterns(patterns)

doc = nlp("Manhattan associates is a company in the U.S.")
print([(ent.text, ent.label_) for ent in doc.ents])
>>> [('Manhattan associates', 'ORG'), ('U.S.', 'GPE')]
Natural Language Processing with spaCy

Let's practice!

Natural Language Processing with spaCy

Preparing Video For Download...