Introduction à la régression avec statsmodels en Python
Maarten Van den Broeck
Content Developer at DataCamp
sns.regplot(x="time_since_last_purchase",
y="has_churned",
data=churn,
ci=None,
logistic=True)
plt.show()

mdl_recency = logit("has_churned ~ time_since_last_purchase", data = churn).fit()explanatory_data = pd.DataFrame( {"time_since_last_purchase": np.arange(-1, 6.25, 0.25)})prediction_data = explanatory_data.assign( has_churned = mdl_recency.predict(explanatory_data))
sns.regplot(x="time_since_last_purchase",
y="has_churned",
data=churn,
ci=None,
logistic=True)
sns.scatterplot(x="time_since_last_purchase",
y="has_churned",
data=prediction_data,
color="red")
plt.show()

prediction_data = explanatory_data.assign( has_churned = mdl_recency.predict(explanatory_data))prediction_data["most_likely_outcome"] = np.round(prediction_data["has_churned"])
sns.regplot(x="time_since_last_purchase",
y="has_churned",
data=churn,
ci=None,
logistic=True)
sns.scatterplot(x="time_since_last_purchase",
y="most_likely_outcome",
data=prediction_data,
color="red")
plt.show()

Le rapport de cotes est la probabilité qu'un événement se produise divisée par la probabilité qu'il ne se produise pas.
$$ \text{odds\_ratio} = \frac{\text{probability}}{(1 - \text{probability)}} $$
$$ \text{odds\_ratio} = \frac{0.25}{(1 - 0.25)} = \frac{1}{3} $$

prediction_data["odds_ratio"] = prediction_data["has_churned"] /
(1 - prediction_data["has_churned"])
sns.lineplot(x="time_since_last_purchase", y="odds_ratio", data=prediction_data)plt.axhline(y=1, linestyle="dotted")plt.show()

sns.lineplot(x="time_since_last_purchase",
y="odds_ratio",
data=prediction_data)
plt.axhline(y=1,
linestyle="dotted")
plt.yscale("log")
plt.show()

prediction_data["log_odds_ratio"] = np.log(prediction_data["odds_ratio"])
| time_since_last_prchs | has_churned | most_likely_rspns | odds_ratio | log_odds_ratio |
|---|---|---|---|---|
| 0 | 0,491 | 0 | 0,966 | -0,035 |
| 2 | 0,623 | 1 | 1,654 | 0,503 |
| 4 | 0,739 | 1 | 2,834 | 1,042 |
| 6 | 0,829 | 1 | 4,856 | 1,580 |
| ... | ... | ... | ... | ... |
| Faire évoluer | Les valeurs sont-elles faciles à interpréter ? | Les modifications sont-elles faciles à interpréter ? | Est-ce exact ? |
|---|---|---|---|
| Probabilité | ✔ | ✘ | ✔ |
| Résultat le plus probable | ✔✔ | ✔ | ✘ |
| Rapport de cotes | ✔ | ✘ | ✔ |
| Rapport de cotes logarithmique | ✘ | ✔ | ✔ |
Introduction à la régression avec statsmodels en Python