Introduction to SpaCy

Introduction to Natural Language Processing in Python

Katharine Jarmul

Founder, kjamistan

What is SpaCy?

  • NLP library similar to gensim, with different implementations
  • Focus on creating NLP pipelines to generate models and corpora
  • Open-source, with extra libraries and tools
    • Displacy
Introduction to Natural Language Processing in Python

Displacy entity recognition visualizer

Introduction to Natural Language Processing in Python
import spacy

nlp = spacy.load('en_core_web_sm')
nlp.entity
<spacy.pipeline.EntityRecognizer at 0x7f76b75e68b8>
doc = nlp("""Berlin is the capital of Germany; 
                  and the residence of Chancellor Angela Merkel.""")

doc.ents
(Berlin, Germany, Angela Merkel)
print(doc.ents[0], doc.ents[0].label_)
Berlin GPE
Introduction to Natural Language Processing in Python

Why use SpaCy for NER?

  • Easy pipeline creation
  • Different entity types compared to nltk
  • Informal language corpora
    • Easily find entities in Tweets and chat messages
  • Quickly growing!
Introduction to Natural Language Processing in Python

Let's practice!

Introduction to Natural Language Processing in Python

Preparing Video For Download...