Sentiment Analysis in Python
Violeta Misheva
Data Scientist
Goal : Enrich the existing dataset with features related to the text column (capturing the sentiment)
reviews.head()
from nltk import word_tokenize
anna_k = 'Happy families are all alike, every unhappy family is unhappy in its own way.'
word_tokenize(anna_k)
['Happy','families','are', 'all','alike',',',
'every','unhappy', 'family', 'is','unhappy','in',
'its','own','way','.']
# General form of list comprehension
[expression for item in iterable]
word_tokens = [word_tokenize(review) for review in reviews.review]
type(word_tokens)
list
type(word_tokens[0])
list
len_tokens = []
# Iterate over the word_tokens list
for i in range(len(word_tokens)):
len_tokens.append(len(word_tokens[i]))
# Create a new feature for the length of each review
reviews['n_tokens'] = len_tokens
reviews.head()
Sentiment Analysis in Python