Charting word length with nltk

Introduzione al Natural Language Processing in Python

Katharine Jarmul

Founder, kjamistan

Getting started with matplotlib

  • Charting library used by many open source Python projects
  • Straightforward functionality with lots of options
    • Histograms
    • Bar charts
    • Line charts
    • Scatter plots
  • ... and also advanced functionality like 3D graphs and animations!
Introduzione al Natural Language Processing in Python

Plotting a histogram with matplotlib

from matplotlib import pyplot as plt

plt.hist([1, 5, 5, 7, 7, 7, 9])
(array([ 1., 0., 0., 0., 0., 2., 0., 3., 0., 1.]),
 array([ 1., 1.8, 2.6, 3.4, 4.2, 5., 5.8, 6.6, 7.4, 8.2, 9.]),
        <a list of 10 Patch objects>)
plt.show()
Introduzione al Natural Language Processing in Python

Generated histogram

histogram

Introduzione al Natural Language Processing in Python

Combining NLP data extraction with plotting

from matplotlib import pyplot as plt
from nltk.tokenize import word_tokenize

words = word_tokenize("This is a pretty cool tool!")
word_lengths = [len(w) for w in words]
plt.hist(word_lengths)
(array([ 2., 0., 1., 0., 0., 0., 3., 0., 0., 1.]),
 array([ 1., 1.5, 2., 2.5, 3., 3.5, 4., 4.5, 5., 5.5, 6.]),
        <a list of 10 Patch objects>)
plt.show()
Introduzione al Natural Language Processing in Python

Word length histogram

word length histogram

Introduzione al Natural Language Processing in Python

Let's practice!

Introduzione al Natural Language Processing in Python

Preparing Video For Download...