Introduction to spaCy

Advanced NLP with spaCy

Ines Montani

spaCy core developer

The nlp object

# Import the English language class
from spacy.lang.en import English

# Create the nlp object nlp = English()
  • contains the processing pipeline
  • includes language-specific rules for tokenization etc.
Advanced NLP with spaCy

The Doc object

# Created by processing a string of text with the nlp object
doc = nlp("Hello world!")

# Iterate over tokens in a Doc for token in doc: print(token.text)
Hello
world
!
Advanced NLP with spaCy

The Token object

Illustration of a Doc object containing four tokens

doc = nlp("Hello world!")

# Index into the Doc to get a single Token
token = doc[1]

# Get the token text via the .text attribute print(token.text)
world
Advanced NLP with spaCy

The Span object

Illustration of a Doc object containing four tokens and three of them wrapped in a Span

doc = nlp("Hello world!")

# A slice from the Doc is a Span object
span = doc[1:4]

# Get the span text via the .text attribute print(span.text)
world!
Advanced NLP with spaCy

Lexical attributes

doc = nlp("It costs $5.")

print('Index: ', [token.i for token in doc])
print('Text: ', [token.text for token in doc])
print('is_alpha:', [token.is_alpha for token in doc]) print('is_punct:', [token.is_punct for token in doc]) print('like_num:', [token.like_num for token in doc])
Index:    [0, 1, 2, 3, 4]

Text: ['It', 'costs', '$', '5', '.']
is_alpha: [True, True, False, False, False] is_punct: [False, False, False, False, True] like_num: [False, False, False, True, False]
Advanced NLP with spaCy

Let's practice!

Advanced NLP with spaCy

Preparing Video For Download...