Sentiment Analysis in Python
Violeta Misheva
Data Scientist
TF: term frequency: How often a given word appears within a document in the corpus
Inverse document frequency: Log-ratio between the total number of documents and the number of documents that contain a specific word
TfIdf = term frequency * inverse document frequency
# Import the TfidfVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
vect = TfidfVectorizer(max_features=100).fit(tweets.text)
X = vect.transform(tweets.text)
X
<14640x100 sparse matrix of type '<class 'numpy.float64'>'
with 119182 stored elements in Compressed Sparse Row format>
X_df = pd.DataFrame(X_txt.toarray(), columns=vect.get_feature_names())
X_df.head()
Sentiment Analysis in Python