Stop words

Sentiment Analysis in Python

Violeta Misheva

Data Scientist

What are stop words and how to find them?

Stop words: words that occur too frequently and not considered informative

  • Lists of stop words in most languages

      {'the', 'a', 'an', 'and', 'but', 'for', 'on', 'in', 'at' ...}
    
  • Context matters

      {'movie', 'movies', 'film', 'films', 'cinema'}
    
Sentiment Analysis in Python

Stop words with word clouds

  • Word cloud, not removing stop words Word cloud without removing stop words
  • Word cloud with stop words removed Word cloud with stop words removed
Sentiment Analysis in Python

Remove stop words from word clouds

# Import libraries
from wordcloud import WordCloud, STOPWORDS
import matplotlib.pyplot as plt
# Define the stopwords list
my_stopwords = set(STOPWORDS)
my_stopwords.update(["movie", "movies", "film", "films", "watch", "br"])
# Generate and show the word cloud
my_cloud = WordCloud(background_color='white', stopwords=my_stopwords).generate(name_string)
plt.imshow(my_cloud, interpolation='bilinear')
Sentiment Analysis in Python

Stop words with BOW

from sklearn.feature_extraction.text import CountVectorizer, ENGLISH_STOP_WORDS
# Define the set of stop words
my_stop_words = ENGLISH_STOP_WORDS.union(['film', 'movie', 'cinema', 'theatre'])
vect = CountVectorizer(stop_words=my_stop_words) 
vect.fit(movies.review)
X = vect.transform(movies.review)
Sentiment Analysis in Python

Let's practice!

Sentiment Analysis in Python

Preparing Video For Download...