Customizing spaCy models

Natural Language Processing with spaCy

Azadeh Mobasher

Principal data scientist

Why train spaCy models?

  • Go a long way for general NLP use cases
  • But may not have seen specific domains data during their training, e.g.
    • Twitter data
    • Medical data

Example of a medical domain NER

Natural Language Processing with spaCy

Why train spaCy models?

 

  • Better results on your specific domain
  • Essential for domain specific text classification

 

Before start training, ask the following questions:

  • Do spaCy models perform well enough on our data?
  • Does our domain include many labels that are absent in spaCy models?
Natural Language Processing with spaCy

Models performance on our data

  • Do spaCy models perform well enough on our data?
  • Oxford Street is not correctly classified with a GPE label:
import spacy
nlp = spacy.load("en_core_web_sm")

text = "The car was navigating to the Oxford Street."
doc = nlp(text)
print([(ent.text, ent.label_) for ent in doc.ents])
[('the Oxford Street', 'ORG')]
Natural Language Processing with spaCy

Output labels in spaCy models

  • Does our domain include many labels that are absent in spaCy models?

NER example of common vs. medical domains

Natural Language Processing with spaCy

Output labels in spaCy models

 

If we need custom model training, we follow these steps:

  • Collect our domain specific data
  • Annotate our data
  • Determine to update an existing model or train a model from scratch
Natural Language Processing with spaCy

Let's practice!

Natural Language Processing with spaCy

Preparing Video For Download...