Logistic regression: revisited

Sentiment Analysis in Python

Violeta Misheva

Data Scientist

Complex models and regularization

Complex models:

  • Complex model that captures the noise in the data (overfitting)
  • Having a large number of features or parameters

Regularization:

  • A way to simplify and ensure we have a less complex model
Sentiment Analysis in Python

Regularization in a logistic regression

from sklearn.linear_model import LogisticRegression
# Regularization arguments
LogisticRegression(penalty='l2', C=1.0)
  • L2: shrinks all coefficients towards zero
  • High values of C: low penalization, model fits the training data well.
  • Low values of C: high penalization, model less flexible.
Sentiment Analysis in Python

Predicting a probability vs. predicting a class

log_reg = LogisticRegression().fit(X_train, y_train)
# Predict labels 
y_predicted = log_reg.predict(X_test)
# Predict probability
y_probab = log_reg.predict_proba(X_test)
Sentiment Analysis in Python

Predicting a probability vs. predicting a class

y_probab
array([[0.5002245, 0.4997755],
       [0.4900345, 0.5099655],
        ...,
       [0.7040499, 0.2959501]])
# Select the probabilities of class 1
y_probab = log_reg.predict_proba(X_test)[:, 1]
array([0.4997755, 0.5099655 ..., 0.2959501]])
Sentiment Analysis in Python

Model metrics with predicted probabilities

  • Raise ValueError when applied with probabilities.
  • Accuracy score and confusion matrix work with classes.
# Default probability encoding:
# If probability >= 0.5, then class 1 Else class 0
Sentiment Analysis in Python

Let's practice!

Sentiment Analysis in Python

Preparing Video For Download...