Custom pipeline components

Advanced NLP with spaCy

Ines Montani

spaCy core developer

Why custom components?

Illustration of the spaCy pipeline

  • Make a function execute automatically when you call nlp
  • Add your own metadata to documents and tokens
  • Updating built-in attributes like doc.ents
Advanced NLP with spaCy

Anatomy of a component (1)

  • Function that takes a doc, modifies it and returns it
  • Can be added using the nlp.add_pipe method
def custom_component(doc):
    # Do something to the doc here
    return doc

nlp.add_pipe(custom_component)
Advanced NLP with spaCy

Anatomy of a component (2)

def custom_component(doc):
    # Do something to the doc here
    return doc

nlp.add_pipe(custom_component)
Argument Description Example
last If True, add last nlp.add_pipe(component, last=True)
first If True, add first nlp.add_pipe(component, first=True)
before Add before component nlp.add_pipe(component, before='ner')
after Add after component nlp.add_pipe(component, after='tagger')
Advanced NLP with spaCy

Example: a simple component (1)

# Create the nlp object
nlp = spacy.load('en_core_web_sm')

# Define a custom component def custom_component(doc):
# Print the doc's length print('Doc length:' len(doc))
# Return the doc object return doc
# Add the component first in the pipeline nlp.add_pipe(custom_component, first=True)
# Print the pipeline component names print('Pipeline:', nlp.pipe_names)
Pipeline: ['custom_component', 'tagger', 'parser', 'ner']
Advanced NLP with spaCy

Example: a simple component (2)

# Create the nlp object
nlp = spacy.load('en_core_web_sm')

# Define a custom component
def custom_component(doc):

    # Print the doc's length
    print('Doc length:' len(doc))

    # Return the doc object
    return doc

# Add the component first in the pipeline
nlp.add_pipe(custom_component, first=True)

# Process a text doc = nlp("Hello world!")
Doc length: 3
Advanced NLP with spaCy

Let's practice!

Advanced NLP with spaCy

Preparing Video For Download...