Introduction to Natural Language Processing in Python
Katharine Jarmul
Founder, kjamistan
Text: "The cat is in the box. The cat likes the box. The box is over the cat."
Bag of words (stripped punctuation):
from nltk.tokenize import word_tokenize
from collections import Counter
Counter(word_tokenize("""The cat is in the box. The cat likes the box. The box is over the cat."""))
Counter({'.': 3,
'The': 3,
'box': 3,
'cat': 3,
'in': 1,
...
'the': 3})
counter.most_common(2)
[('The', 3), ('box', 3)]
Introduction to Natural Language Processing in Python