Bringing it all together

Sentiment Analysis in Python

Violeta Misheva

Data Scientist

The Sentiment Analysis problem

Sentiment analysis as the process of understanding the opinion of an author about a subject

  • Movie reviews
  • Amazon product reviews
  • Twitter airline sentiment
  • Various emotionally charged literary examples
Sentiment Analysis in Python

Exploration of the reviews

  • Basic information about size of reviews
  • Word clouds
  • Features for the length of reviews: number of words, number of sentences
  • Feature detecting the language of a review
Sentiment Analysis in Python

Numeric transformations of sentiment-carrying columns

  • Bag-of-words
  • TfIdf vectorization
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
# Vectorizer syntax
vect = CountVectorizer().fit(data.text_column)
X = vect.transform(data.text_column)
Sentiment Analysis in Python

Arguments of the vectorizers

  • stop words: non-informative, frequently occurring words
  • n-gram range: use phrases not only single words
  • control size of vocabulary: max_features, max_df, min_df
  • capturing a pattern of tokens: remove digits or certain characters

Important but NOT arguments to the vectorizers

  • lemmas and stems
Sentiment Analysis in Python

Supervised learning model

  • Logistic regression classifier to predict the sentiment
  • Evaluated with accuracy and confusion matrix
  • Importance of train/test split
Sentiment Analysis in Python

Let's practice!

Sentiment Analysis in Python

Preparing Video For Download...