How logistic regression works

Intermediate Regression in R

Richie Cotton

Data Evangelist at DataCamp

Sum of squares doesn't work

sum((y_pred - y_actual) ^ 2)

y_actual is always 0 or 1.

y_pred is between 0 and 1.

There is a better metric than sum of squares.

Intermediate Regression in R

Likelihood

    y_pred * y_actual
Intermediate Regression in R

Likelihood

    y_pred * y_actual + (1 - y_pred) * (1 - y_actual)
Intermediate Regression in R

Likelihood

sum(y_pred * y_actual + (1 - y_pred) * (1 - y_actual))

When y_actual = 1

y_pred * 1 + (1 - y_pred) * (1 - 1) = y_pred

When y_actual = 0

y_pred * 0 + (1 - y_pred) * (1 - 0) = 1 - y_pred
Intermediate Regression in R

Log-likelihood

  • Computing likelihood involves adding many very small numbers, leading to numerical error.
  • Log-likelihood is easier to compute.
log(y_pred) * y_actual + log(1 - y_pred) * (1 - y_actual)

Both equations give the same answer.

Intermediate Regression in R

Negative log-likelihood

Maximizing log-likelihood is the same as minimizing negative log-likelihood.

-sum(log_likelihoods)
Intermediate Regression in R

Logistic regression algorithm

calc_neg_log_likelihood <- function(coeffs) {

intercept <- coeffs[1] slope <- coeffs[2]
# More calculation!
}
optim(
  par = ???,
  fn = ???
)
Intermediate Regression in R

Let's practice!

Intermediate Regression in R

Preparing Video For Download...