Intermediate Regression with statsmodels in Python
Maarten Van den Broeck
Content Developer at DataCamp
np.sum((y_pred - y_actual) ** 2)
y_actual is always 0 or 1.
y_pred is between 0 and 1.
There is a better metric than sum of squares.
       y_pred * y_actual
       y_pred * y_actual + (1 - y_pred) * (1 - y_actual)
np.sum(y_pred * y_actual + (1 - y_pred) * (1 - y_actual))
When y_actual = 1
y_pred * 1 + (1 - y_pred) * (1 - 1) = y_pred
When y_actual = 0
y_pred * 0 + (1 - y_pred) * (1 - 0) = 1 - y_pred
log_likelihood = np.log(y_pred) * y_actual + np.log(1 - y_pred) * (1 - y_actual)
Both equations give the same answer.
Maximizing log-likelihood is the same as minimizing negative log-likelihood.
-np.sum(log_likelihoods)
def calc_neg_log_likelihood(coeffs)intercept, slope = coeffs# More calculation!
from scipy.optimize import minimize
minimize(
  fun=calc_neg_log_likelihood,
  x0=[0, 0]
)
Intermediate Regression with statsmodels in Python