Did we really predict the sentiment well?

Sentiment Analysis in Python

Violeta Misheva

Data Scientist

Train/test split

total number of observations split into a training and testing set, where the training set is the bigger block

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123, stratify=y)

X : features
y : labels
test_size: proportion of data used in testing
random_state: seed generator used to make the split
stratify: proportion of classes in the sample produced will be the same as the proportion of values provided to this parameter

log_reg = LogisticRegression().fit(X_train, y_train)

print('Accuracy on training data: ', log_reg.score(X_train, y_train))

0.76

print('Accuracy on testing data: ', log_reg.score(X_test, y_test))

0.73

from sklearn.metrics import accuracy_score

log_reg = LogisticRegression().fit(X_train, y_train)

y_predicted = log_reg.predict(X_test)
print('Accuracy score on test data: ', accuracy_score(y_test, y_predicted))

0.73

an example of a confusion matrix of a binary classification problem

from sklearn.metrics import confusion_matrix

log_reg = LogisticRegression().fit(X_train, y_train)
y_predicted = log_reg.predict(X_test)

print(confusion_matrix(y_test, y_predicted)/len(y_test))

[[0.3788 0.1224]
 [0.1352 0.3636]]

Sentiment Analysis in Python