Charting word length with nltk

Introduction to Natural Language Processing in Python

Katharine Jarmul

Founder, kjamistan

Getting started with matplotlib

  • Charting library used by many open source Python projects
  • Straightforward functionality with lots of options
    • Histograms
    • Bar charts
    • Line charts
    • Scatter plots
  • ... and also advanced functionality like 3D graphs and animations!
Introduction to Natural Language Processing in Python

Plotting a histogram with matplotlib

from matplotlib import pyplot as plt

plt.hist([1, 5, 5, 7, 7, 7, 9])
(array([ 1., 0., 0., 0., 0., 2., 0., 3., 0., 1.]),
 array([ 1., 1.8, 2.6, 3.4, 4.2, 5., 5.8, 6.6, 7.4, 8.2, 9.]),
        <a list of 10 Patch objects>)
plt.show()
Introduction to Natural Language Processing in Python

Generated histogram

histogram

Introduction to Natural Language Processing in Python

Combining NLP data extraction with plotting

from matplotlib import pyplot as plt
from nltk.tokenize import word_tokenize

words = word_tokenize("This is a pretty cool tool!")
word_lengths = [len(w) for w in words]
plt.hist(word_lengths)
(array([ 2., 0., 1., 0., 0., 0., 3., 0., 0., 1.]),
 array([ 1., 1.5, 2., 2.5, 3., 3.5, 4., 4.5, 5., 5.5, 6.]),
        <a list of 10 Patch objects>)
plt.show()
Introduction to Natural Language Processing in Python

Word length histogram

word length histogram

Introduction to Natural Language Processing in Python

Let's practice!

Introduction to Natural Language Processing in Python

Preparing Video For Download...