Natural Language Processing (NLP) in Python
Fouad Trad
Machine Learning Engineer

Understanding the topic of a text

Understanding the topic of a text

Tasks requiring every word in the text

NLTK provides a list of stop words for several languages
from nltk.corpus import stopwords nltk.download('stopwords')stop_words = stopwords.words('english')print(stop_words[:10])
['a', 'about', 'above', 'after', 'again', 'against', 'ain', 'all', 'am', 'an']
from nltk.tokenize import word_tokenizetext = "This is an example to demonstrate removing stop words."tokens = word_tokenize(text)# The .lower() method helps with case sensitivity filtered_tokens = [word for word in tokens if word.lower() not in stop_words]print(filtered_tokens)
['example', 'demonstrate', 'removing', 'stop', 'words', '.']

Tasks requiring to find common or important words in documents

Tasks requiring to find common or important words in documents

Tasks requiring to maintain sentence structure for clarity

import string
print(string.punctuation)
!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
text = "This is an example to demonstrate removing stop words." tokens = word_tokenize(text) filtered_tokens = [word for word in tokens if word.lower() not in stop_words]clean_tokens = [word for word in filtered_tokens if word not in string.punctuation]print(clean_tokens)
['example', 'demonstrate', 'removing', 'stop', 'words']
Natural Language Processing (NLP) in Python