Introduction to natural language processing

Natural Language Processing (NLP) in Python

Fouad Trad

Machine Learning Engineer

Meet the instructor...

 

Photo of the instructor.

 

Fouad Trad

  • Machine learning engineer
  • Research scientist
  • NLP in cybersecurity and healthcare
Natural Language Processing (NLP) in Python

What is NLP?

 

 

  • Language is our primary way to communicate
  • Computers don't understand our language

Image showing where language is: books, websites, social media posts, and emails.

Natural Language Processing (NLP) in Python

What is NLP?

Enables computers to analyze human language

Image showing a person talking to a machine and natural language processing is translating what the person is saying so the machine understands.

Natural Language Processing (NLP) in Python

NLP workflow

First step of the workflow: raw text.

  • Raw text: anything from tweet to book paragraph
Natural Language Processing (NLP) in Python

NLP workflow

Second step of the workflow: preprocessing

  • Raw text: anything from tweet to book paragraph
  • Preprocessing: cleaning text and removing unnecessary elements
Natural Language Processing (NLP) in Python

NLP workflow

Third step of the workflow: feature extraction.

  • Raw text: anything from tweet to book paragraph
  • Preprocessing: cleaning text and removing unnecessary elements
  • Feature extraction: converting text into numbers
Natural Language Processing (NLP) in Python

NLP workflow

Fourth step of the workflow: Modeling

  • Raw text: anything from tweet to book paragraph
  • Preprocessing: cleaning text and removing unnecessary elements
  • Feature extraction: converting text into numbers
  • Model: analyze, predict, classify, generate new content
Natural Language Processing (NLP) in Python

Course plan

The full workflow diagram mentioning that Chapter 1 will cover preprocessing with NLTK

Natural Language Processing (NLP) in Python

Course plan

The full workflow diagram mentioning that Chapter 2 will cover feature extraction using scikit-learn and Gensim.

Natural Language Processing (NLP) in Python

Course plan

The full workflow diagram mentioning that Chapters 3 and 4 will cover pipelines that hide the three steps: preprocessing, feature extraction, and modeling using the transformers libraries.

Natural Language Processing (NLP) in Python

Tokenization

  • Breaks text into tokens (smaller manageable pieces)

Image showing a person chopping carrots

Natural Language Processing (NLP) in Python

Sentence tokenization

  • Text → sentences
  • Offers clearer insights than analyzing the text as a whole
import nltk

nltk.download('punkt_tab')
text = "NLP is fun. Let's dive into it!"
sentences = nltk.sent_tokenize(text)
print(sentences)
["NLP is fun.", "Let's dive into it!"]

Image showing an icon representing translation.

Natural Language Processing (NLP) in Python

Word tokenization

  • Text → words and punctuation
  • Useful for tasks that require:
    • Identifying key terms
    • Counting word frequency
text = "Claim your free prize now!"

words = nltk.word_tokenize(text)
print(words)
['Claim', 'your', 'free', 'prize', 'now', '!']

Image showing an icon for a spam email.

Natural Language Processing (NLP) in Python

Let's practice!

Natural Language Processing (NLP) in Python

Preparing Video For Download...