Introduction to Natural Language Processing in Python
Katharine Jarmul
Founder, kjamistan
Input text: Cats, dogs and birds are common pets. So are fish.
Output tokens: cat, dog, bird, common, pet, fish
from nltk.corpus import stopwords text = """The cat is in the box. The cat likes the box. The box is over the cat."""
tokens = [w for w in word_tokenize(text.lower()) if w.isalpha()]
no_stops = [t for t in tokens if t not in stopwords.words('english')]
Counter(no_stops).most_common(2)
[('cat', 3), ('box', 3)]
Introduction to Natural Language Processing in Python