The training loop

NLP avanzato con spaCy

Ines Montani

spaCy core developer

The steps of a training loop

  1. Loop for a number of times.
  2. Shuffle the training data.
  3. Divide the data into batches.
  4. Update the model for each batch.
  5. Save the updated model.
NLP avanzato con spaCy

Recap: How training works

Diagram of the training process

  • Training data: Examples and their annotations.
  • Text: The input text the model should predict a label for.
  • Label: The label the model should predict.
  • Gradient: How to change the weights.
NLP avanzato con spaCy

Example loop

TRAINING_DATA = [
    ("How to preorder the iPhone X", {'entities': [(20, 28, 'GADGET')]})
    # And many more examples...
]
# Loop for 10 iterations
for i in range(10):

# Shuffle the training data random.shuffle(TRAINING_DATA)
# Create batches and iterate over them for batch in spacy.util.minibatch(TRAINING_DATA):
# Split the batch in texts and annotations texts = [text for text, annotation in batch] annotations = [annotation for text, annotation in batch]
# Update the model nlp.update(texts, annotations)
# Save the model nlp.to_disk(path_to_model)
NLP avanzato con spaCy

Updating an existing model

  • Improve the predictions on new data
  • Especially useful to improve existing categories, like PERSON
  • Also possible to add new categories
  • Be careful and make sure the model doesn't "forget" the old ones
NLP avanzato con spaCy

Setting up a new pipeline from scratch

# Start with blank English model
nlp = spacy.blank('en')

# Create blank entity recognizer and add it to the pipeline ner = nlp.create_pipe('ner') nlp.add_pipe(ner)
# Add a new label ner.add_label('GADGET')
# Start the training nlp.begin_training()
# Train for 10 iterations for itn in range(10): random.shuffle(examples)
# Divide examples into batches for batch in spacy.util.minibatch(examples, size=2): texts = [text for text, annotation in batch] annotations = [annotation for text, annotation in batch]
# Update the model nlp.update(texts, annotations)
NLP avanzato con spaCy

Let's practice!

NLP avanzato con spaCy

Preparing Video For Download...