Natural Language Processing (NLP) basics

Natural Language Processing with spaCy

Azadeh Mobasher

Principal Data Scientist

Natural Language Processing (NLP)

 

  • A subfield of Artificial Intelligence (AI)
  • Helps computers to understand human language
  • Helps extract insights from unstructured data
  • Incorporates statistics, machine learning models and deep learning models

 

NLP, a subfield of AI

Natural Language Processing with spaCy

NLP use cases

Sentiment analysis  

  • Use of computers to determine the underlying subjective tone of a piece of writing

 

Sentiment Analysis Examples

Natural Language Processing with spaCy

NLP use cases

Named entity recognition (NER)  

  • Locating and classifying named entities mentioned in unstructured text into pre-defined categories
  • Named entities are real-world objects such as a person or location

 

NER Examples

Natural Language Processing with spaCy

NLP use cases

 

  • Generate human-like responses to text input, such as ChatGPT

 

Chatbots

Natural Language Processing with spaCy

Introduction to spaCy

spaCy is a free, open-source library for NLP in Python which:

  • Is designed to build systems for information extraction
  • Provides production-ready code for NLP use cases
  • Supports 64+ languages
  • Is robust and fast and has visualization libraries

 

spaCy and NLP

Natural Language Processing with spaCy

Install and import spaCy

 

  • As the first step, spaCy can be installed using the Python package manager pip
  • spaCy trained models can be downloaded
  • Multiple trained models are available for English language at spacy.io

 

$ python3 pip install spacy
python3 -m spacy download en_core_web_sm
import spacy
nlp = spacy.load("en_core_web_sm")
Natural Language Processing with spaCy

Read and process text with spaCy

  • Loaded spaCy model en_core_web_sm = nlp object
  • nlp object converts text into a Doc object (container) to store processed text

Text processing with spaCy

Natural Language Processing with spaCy

spaCy in action

  • Processing a string using spaCy
import spacy
nlp = spacy.load("en_core_web_sm")
text = "A spaCy pipeline object is created."
doc = nlp(text)
  • Tokenization
    • A Token is defined as the smallest meaningful part of the text.
    • Tokenization: The process of dividing a text into a list of meaningful tokens
print([token.text for token in doc])
['A', 'spaCy', 'pipeline', 'object', 'is', 'created', '.']
Natural Language Processing with spaCy

Let's practice!

Natural Language Processing with spaCy

Preparing Video For Download...