Named entity recognition

Feature Engineering for NLP in Python

Rounak Banik

Data Scientist

Applications

  • Efficient search algorithms
  • Question answering
  • News article classification
  • Customer service
Feature Engineering for NLP in Python

Named entity recognition

  • Identifying and classifying named entities into predefined categories.
  • Categories include person, organization, country, etc.
    "John Doe is a software engineer working at Google. He lives in France."
    
  • Named Entities
  • John Doe → person
  • Google → organization
  • France → country (geopolitical entity)
Feature Engineering for NLP in Python

NER using spaCy

import spacy
string = "John Doe is a software engineer working at Google. He lives in France."

# Load model and create Doc object
nlp = spacy.load('en_core_web_sm')
doc = nlp(string)

# Generate named entities ne = [(ent.text, ent.label_) for ent in doc.ents] print(ne)
[('John Doe', 'PERSON'), ('Google', 'ORG'), ('France', 'GPE')]
Feature Engineering for NLP in Python

NER annotations in spaCy

spaCy documentation on NER annotations

Feature Engineering for NLP in Python

A word of caution

  • Not perfect
  • Performance dependent on training and test data
  • Train models with specialized data for nuanced cases
  • Language specific
Feature Engineering for NLP in Python

Let's practice!

Feature Engineering for NLP in Python

Preparing Video For Download...