Natural Language Processing (NLP) in Python
Fouad Trad
Machine Learning Engineer
Understanding the topic of a text
Understanding the topic of a text
Tasks requiring every word in the text
NLTK provides a list of stop words for several languages
from nltk.corpus import stopwords nltk.download('stopwords')
stop_words = stopwords.words('english')
print(stop_words[:10])
['a', 'about', 'above', 'after', 'again', 'against', 'ain', 'all', 'am', 'an']
from nltk.tokenize import word_tokenize
text = "This is an example to demonstrate removing stop words."
tokens = word_tokenize(text)
# The .lower() method helps with case sensitivity filtered_tokens = [word for word in tokens if word.lower() not in stop_words]
print(filtered_tokens)
['example', 'demonstrate', 'removing', 'stop', 'words', '.']
Tasks requiring to find common or important words in documents
Tasks requiring to find common or important words in documents
Tasks requiring to maintain sentence structure for clarity
import string
print(string.punctuation)
!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
text = "This is an example to demonstrate removing stop words." tokens = word_tokenize(text) filtered_tokens = [word for word in tokens if word.lower() not in stop_words]
clean_tokens = [word for word in filtered_tokens if word not in string.punctuation]
print(clean_tokens)
['example', 'demonstrate', 'removing', 'stop', 'words']
Natural Language Processing (NLP) in Python