Sentiment Analysis in Python
Violeta Misheva
Data Scientist
I am happy, not sad.
I am sad, not happy.
Unigrams : single tokens
Bigrams: pairs of tokens
Trigrams: triples of tokens
n-grams: sequence of n-tokens
The weather today is wonderful.
Unigrams : { The, weather, today, is, wonderful }
Bigrams: {The weather, weather today, today is, is wonderful}
Trigrams: {The weather today, weather today is, today is wonderful}
from sklearn.feature_extraction.text import CountVectorizer
vect = CountVectorizer(ngram_range=(min_n, max_n))
# Only unigrams
ngram_range=(1, 1)
# Uni- and bigrams
ngram_range=(1, 2)
CountVectorizer(max_features, max_df, min_df)
Sentiment Analysis in Python