Multiple logistic regression

Intermediate Regression in R

Richie Cotton

Data Evangelist at DataCamp

Bank churn dataset

has_churned time_since_first_purchase time_since_last_purchase
0 0.3993247 -0.5158691
1 -0.4297957 0.6780654
0 3.7383122 0.4082544
0 0.6032289 -0.6990435
... ... ...
response length of relationship recency of activity
1 https://www.rdocumentation.org/packages/bayesQR/topics/Churn
Intermediate Regression in R

glm()

glm(response ~ explanatory, data = dataset, family = binomial)
glm(response ~ explanatory1 + explanatory2, data = dataset, family = binomial)
glm(response ~ explanatory1 * explanatory2, data = dataset, family = binomial)
Intermediate Regression in R

Prediction flow

explanatory_data <- expand_grid(
  explanatory1 = some_values,
  explanatory2 = some_values
)
prediction_data <- explanatory_data %>% 
  mutate(
    has_churned = predict(mdl, explanatory_data, type = "response")
  )
Intermediate Regression in R

The four outcomes

actual false actual true
predicted false correct false negative
predicted true false positive correct
1 https://campus.datacamp.com/courses/introduction-to-regression-in-r/simple-logistic-regression?ex=10
Intermediate Regression in R

Confusion matrix

actual_response <- dataset$response
predicted_response <- round(fitted(mdl))
outcomes <- table(predicted_response, actual_response)
confusion <- conf_mat(outcomes)
autoplot(confusion)
summary(confusion, event_level = "second")
Intermediate Regression in R

Visualization

  • Use faceting for categorical variables.
  • For 2 numeric explanatory variables, use color for response.
  • Give responses below 0.5 one color; responses above 0.5 another color.
scale_color_gradient2(midpoint = 0.5)
Intermediate Regression in R

Let's practice!

Intermediate Regression in R

Preparing Video For Download...