Natural Language Processing with spaCy
Azadeh Mobasher
Principal Data Scientist
EntityRuler
adds named-entities to a Doc
containerEntityRecognizer
{"label": "ORG", "pattern": "Microsoft"}
{"label": "GPE", "pattern": [{"LOWER": "san"}, {"LOWER": "francisco"}]}
.add_pipe()
method.add_patterns()
method
nlp = spacy.blank("en")
entity_ruler = nlp.add_pipe("entity_ruler")
patterns = [{"label": "ORG", "pattern": "Microsoft"},
{"label": "GPE", "pattern": [{"LOWER": "san"}, {"LOWER": "francisco"}]}]
entity_ruler.add_patterns(patterns)
.ents
store the results of an EntityLinker
component
doc = nlp("Microsoft is hiring software developer in San Francisco.")
print([(ent.text, ent.label_) for ent in doc.ents])
[('Microsoft', 'ORG'), ('San Francisco', 'GPE')]
spaCy
pipeline componentsEnhances the named-entity recognizer
spaCy
model without EntityRuler
:
nlp = spacy.load("en_core_web_sm")
doc = nlp("Manhattan associates is a company in the U.S.")
print([(ent.text, ent.label_) for ent in doc.ents])
>>> [('Manhattan', 'GPE'), ('U.S.', 'GPE')]
EntityRuler
added after existing ner
component:nlp = spacy.load("en_core_web_sm")
ruler = nlp.add_pipe("entity_ruler", after='ner')
patterns = [{"label": "ORG", "pattern": [{"lower": "manhattan"}, {"lower": "associates"}]}]
ruler.add_patterns(patterns)
doc = nlp("Manhattan associates is a company in the U.S.")
print([(ent.text, ent.label_) for ent in doc.ents])
>>> [('Manhattan', 'GPE'), ('U.S.', 'GPE')]
EntityRuler
added before existing ner
component:nlp = spacy.load("en_core_web_sm")
ruler = nlp.add_pipe("entity_ruler", before='ner')
patterns = [{"label": "ORG", "pattern": [{"lower": "manhattan"}, {"lower": "associates"}]}]
ruler.add_patterns(patterns)
doc = nlp("Manhattan associates is a company in the U.S.")
print([(ent.text, ent.label_) for ent in doc.ents])
>>> [('Manhattan associates', 'ORG'), ('U.S.', 'GPE')]
Natural Language Processing with spaCy