How logistic regression works

Intermediate Regression with statsmodels in Python

Maarten Van den Broeck

Content Developer at DataCamp

Sum of squares doesn't work

np.sum((y_pred - y_actual) ** 2)

y_actual is always 0 or 1.

y_pred is between 0 and 1.

There is a better metric than sum of squares.

       y_pred * y_actual

       y_pred * y_actual + (1 - y_pred) * (1 - y_actual)

np.sum(y_pred * y_actual + (1 - y_pred) * (1 - y_actual))

When y_actual = 1

y_pred * 1 + (1 - y_pred) * (1 - 1) = y_pred

When y_actual = 0

y_pred * 0 + (1 - y_pred) * (1 - 0) = 1 - y_pred

Computing likelihood involves adding many very small numbers, leading to numerical error.
Log-likelihood is easier to compute.

log_likelihood = np.log(y_pred) * y_actual + np.log(1 - y_pred) * (1 - y_actual)

Both equations give the same answer.

Maximizing log-likelihood is the same as minimizing negative log-likelihood.

-np.sum(log_likelihoods)

def calc_neg_log_likelihood(coeffs)

  intercept, slope = coeffs

  # More calculation!

from scipy.optimize import minimize

minimize(
  fun=calc_neg_log_likelihood,
  x0=[0, 0]
)

Intermediate Regression with statsmodels in Python