Introduction to Regression in R
Richie Cotton
Data Evangelist at DataCamp
plt_churn_vs_recency_base <- ggplot(
churn,
aes(time_since_last_purchase, has_churned)
) +
geom_point() +
geom_smooth(
method = "glm",
se = FALSE,
method.args = list(family = binomial)
)
mdl_recency <- glm(
has_churned ~ time_since_last_purchase, data = churn, family = "binomial"
)
explanatory_data <- tibble(
time_since_last_purchase = seq(-1, 6, 0.25)
)
prediction_data <- explanatory_data %>%
mutate(
has_churned = predict(mdl_recency, explanatory_data, type = "response")
)
plt_churn_vs_recency_base +
geom_point(
data = prediction_data,
color = "blue"
)
prediction_data <- explanatory_data %>%
mutate(
has_churned = predict(mdl_recency, explanatory_data, type = "response"),
most_likely_outcome = round(has_churned)
)
plt_churn_vs_recency_base +
geom_point(
aes(y = most_likely_outcome),
data = prediction_data,
color = "green"
)
Odds ratio is the probability of something happening divided by the probability that it doesn't.
$$ odds\_ratio = \frac{probability}{(1 - probability)} $$
$$ odds\_ratio = \frac{0.25}{(1 - 0.25)} = \frac{1}{3} $$
prediction_data <- explanatory_data %>%
mutate(
has_churned = predict(mdl_recency, explanatory_data, type = "response"),
most_likely_response = round(has_churned),
odds_ratio = has_churned / (1 - has_churned)
)
ggplot(
prediction_data,
aes(time_since_last_purchase, odds_ratio)
) +
geom_line() +
geom_hline(yintercept = 1, linetype = "dotted")
ggplot(
prediction_data,
aes(time_since_last_purchase, odds_ratio)
) +
geom_line() +
geom_hline(yintercept = 1, linetype = "dotted") +
scale_y_log10()
prediction_data <- explanatory_data %>%
mutate(
has_churned = predict(mdl_recency, explanatory_data, type = "response"),
most_likely_response = round(has_churned),
odds_ratio = has_churned / (1 - has_churned),
log_odds_ratio = log(odds_ratio),
log_odds_ratio2 = predict(mdl_recency, explanatory_data)
)
tm_snc_lst_prch | has_churned | most_lkly_rspns | odds_ratio | log_odds_ratio | log_odds_ratio2 |
---|---|---|---|---|---|
0 | 0.491 | 0 | 0.966 | -0.035 | -0.035 |
2 | 0.623 | 1 | 1.654 | 0.503 | 0.503 |
4 | 0.739 | 1 | 2.834 | 1.042 | 1.042 |
6 | 0.829 | 1 | 4.856 | 1.580 | 1.580 |
... | ... | ... | ... | ... | ... |
Scale | Are values easy to interpret? | Are changes easy to interpret? | Is precise? |
---|---|---|---|
Probability | ✔ | ✘ | ✔ |
Most likely outcome | ✔✔ | ✔ | ✘ |
Odds ratio | ✔ | ✘ | ✔ |
Log odds ratio | ✘ | ✔ | ✔ |
Introduction to Regression in R