Introduction à la régression dans R
Richie Cotton
Data Evangelist at DataCamp
plt_churn_vs_recency_base <- ggplot(
churn,
aes(time_since_last_purchase, has_churned)
) +
geom_point() +
geom_smooth(
method = "glm",
se = FALSE,
method.args = list(family = binomial)
)

mdl_recency <- glm(
has_churned ~ time_since_last_purchase, data = churn, family = "binomial"
)
explanatory_data <- tibble(
time_since_last_purchase = seq(-1, 6, 0.25)
)
prediction_data <- explanatory_data %>%
mutate(
has_churned = predict(mdl_recency, explanatory_data, type = "response")
)
plt_churn_vs_recency_base +
geom_point(
data = prediction_data,
color = "blue"
)

prediction_data <- explanatory_data %>%
mutate(
has_churned = predict(mdl_recency, explanatory_data, type = "response"),
most_likely_outcome = round(has_churned)
)
plt_churn_vs_recency_base +
geom_point(
aes(y = most_likely_outcome),
data = prediction_data,
color = "green"
)

Le rapport de cotes est la probabilité qu'un événement se produise divisée par la probabilité qu'il ne se produise pas.
$$ odds\_ratio = \frac{probability}{(1 - probability)} $$
$$ cotes_ratio = \frac{0,25}{(1 - 0,25)} = \frac{1}{3} $$

prediction_data <- explanatory_data %>%
mutate(
has_churned = predict(mdl_recency, explanatory_data, type = "response"),
most_likely_response = round(has_churned),
odds_ratio = has_churned / (1 - has_churned)
)
ggplot(
prediction_data,
aes(time_since_last_purchase, odds_ratio)
) +
geom_line() +
geom_hline(yintercept = 1, linetype = "dotted")

ggplot(
prediction_data,
aes(time_since_last_purchase, odds_ratio)
) +
geom_line() +
geom_hline(yintercept = 1, linetype = "dotted") +
scale_y_log10()

prediction_data <- explanatory_data %>%
mutate(
has_churned = predict(mdl_recency, explanatory_data, type = "response"),
most_likely_response = round(has_churned),
odds_ratio = has_churned / (1 - has_churned),
log_odds_ratio = log(odds_ratio),
log_odds_ratio2 = predict(mdl_recency, explanatory_data)
)
| tm_snc_lst_prch | as_churned | most_lkly_rspns | odds_ratio | log_odds_ratio | log_odds_ratio2 |
|---|---|---|---|---|---|
| 0 | 0,491 | 0 | 0,966 | -0,035 | -0,035 |
| 2 | 0,623 | 1 | 1,654 | 0,503 | 0,503 |
| 4 | 0,739 | 1 | 2,834 | 1,042 | 1,042 |
| 6 | 0,829 | 1 | 4,856 | 1,580 | 1,580 |
| ... | ... | ... | ... | ... | ... |
| Faire évoluer | Les valeurs sont-elles faciles à interpréter ? | Les modifications sont-elles faciles à interpréter ? | Est-ce exact ? |
|---|---|---|---|
| Probabilité | ✔ | ✘ | ✔ |
| Résultat le plus probable | ✔✔ | ✔ | ✘ |
| Rapport de cotes | ✔ | ✘ | ✔ |
| Rapport de cotes logarithmique | ✘ | ✔ | ✔ |
Introduction à la régression dans R