Charting word length with nltk

Pengantar Natural Language Processing di Python

Katharine Jarmul

Founder, kjamistan

Getting started with matplotlib

  • Charting library used by many open source Python projects
  • Straightforward functionality with lots of options
    • Histograms
    • Bar charts
    • Line charts
    • Scatter plots
  • ... and also advanced functionality like 3D graphs and animations!
Pengantar Natural Language Processing di Python

Plotting a histogram with matplotlib

from matplotlib import pyplot as plt

plt.hist([1, 5, 5, 7, 7, 7, 9])
(array([ 1., 0., 0., 0., 0., 2., 0., 3., 0., 1.]),
 array([ 1., 1.8, 2.6, 3.4, 4.2, 5., 5.8, 6.6, 7.4, 8.2, 9.]),
        <a list of 10 Patch objects>)
plt.show()
Pengantar Natural Language Processing di Python

Generated histogram

histogram

Pengantar Natural Language Processing di Python

Combining NLP data extraction with plotting

from matplotlib import pyplot as plt
from nltk.tokenize import word_tokenize

words = word_tokenize("This is a pretty cool tool!")
word_lengths = [len(w) for w in words]
plt.hist(word_lengths)
(array([ 2., 0., 1., 0., 0., 0., 3., 0., 0., 1.]),
 array([ 1., 1.5, 2., 2.5, 3., 3.5, 4., 4.5, 5., 5.5, 6.]),
        <a list of 10 Patch objects>)
plt.show()
Pengantar Natural Language Processing di Python

Word length histogram

word length histogram

Pengantar Natural Language Processing di Python

Let's practice!

Pengantar Natural Language Processing di Python

Preparing Video For Download...