Training data preparation

Natural Language Processing with spaCy

Azadeh Mobasher

Principal data scientist

Training steps

 

  1. Annotate and prepare input data
  2. Initialize the model weight
  3. Predict a few examples with the current weights
  4. Compare prediction with correct answers
  5. Use optimizer to calculate weights that improve model performance
  6. Update weights slightly
  7. Go back to step 3.
Natural Language Processing with spaCy

Annotating and preparing data

  • First step is to prepare training data in required format
  • After collecting data, we annotate it
  • Annotation means labeling the intent, entities, etc.
  • This is an example of annotated data:
annotated_data = {
"sentence": "An antiviral drugs used against influenza is neuraminidase inhibitors.",
"entities": {
             "label": "Medicine",
             "value": "neuraminidase inhibitors",
    }
}
Natural Language Processing with spaCy

Annotating and preparing data

  • Here's another example of annotated data:

 

annotated_data = {
"sentence": "Bill Gates visited the SFO Airport.",
"entities": [{"label": "PERSON", "value": "Bill Gates"}, 
             {"label": "LOC", "value": "SFO Airport"}]
}
Natural Language Processing with spaCy

spaCy training data format

  • Data annotation prepares training data for what we want the model to learn
  • Training dataset has to be stored as a dictionary:
training_data = [
("I will visit you in Austin.", {"entities": [(20, 26, "GPE")]}),
("I'm going to Sam's house.", {"entities": [(13,18, "PERSON"), (19, 24, "GPE")]}),
("I will go.", {"entities": []})
]

Three example pairs:

  • Each example pair includes a sentence as the first element
  • Pair's second element is list of annotated entities and start and end characters
Natural Language Processing with spaCy

Example object data for training

  • We cannot feed the raw text directly to spaCy

  • We need to create an Example object for each training example

import spacy
from spacy.training import Example

nlp = spacy.load("en_core_web_sm")

doc = nlp("I will visit you in Austin.")

annotations = {"entities": [(20, 26, "GPE")]} example_sentence = Example.from_dict(doc, annotations)
print(example_sentence.to_dict())
Natural Language Processing with spaCy

Let's practice!

Natural Language Processing with spaCy

Preparing Video For Download...