Natural Language Processing (NLP) in Python
Fouad Trad
Machine Learning Engineer
reviews = [ "I loved the movie. It was amazing!", "The movie was okay.", "I hated the movie. It was boring." ]
cleaned_reviews = [preprocess(review) for review in reviews] print(cleaned_reviews)
['i loved the movie it was amazing',
'the movie was okay',
'i hated the movie it was boring']
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform(cleaned_reviews)
print(tfidf_matrix)
<Compressed Sparse Row sparse matrix of dtype 'float64'
with 16 stored elements and shape (3, 9)>
print(tfidf_matrix.toarray())
[[0.52523431 0. 0. 0.39945423 0.52523431 0.31021184 0. 0.31021184 0.31021184]
[0. 0. 0. 0. 0. 0.41285857 0.69903033 0.41285857 0.41285857]
[0. 0.52523431 0.52523431 0.39945423 0. 0.31021184 0. 0.31021184 0.31021184]]
vectorizer.get_feature_names_out()
['amazing' 'boring' 'hated' 'it' 'loved' 'movie' 'okay' 'the' 'was']
import pandas as pd df_tfidf = pd.DataFrame(
tfidf_matrix.toarray(),
columns=vectorizer.get_feature_names_out() )
import seaborn as sns import matplotlib.pyplot as plt
sns.heatmap(df_tfidf, annot=True)
plt.title("TF-IDF Scores Across Reviews") plt.xlabel("Terms") plt.ylabel("Documents") plt.show()
Natural Language Processing (NLP) in Python