Introduction to Regression with statsmodels in Python
Maarten Van den Broeck
Content Developer at DataCamp
sns.regplot(x="time_since_last_purchase",
y="has_churned",
data=churn,
ci=None,
logistic=True)
plt.show()
mdl_recency = logit("has_churned ~ time_since_last_purchase", data = churn).fit()
explanatory_data = pd.DataFrame( {"time_since_last_purchase": np.arange(-1, 6.25, 0.25)})
prediction_data = explanatory_data.assign( has_churned = mdl_recency.predict(explanatory_data))
sns.regplot(x="time_since_last_purchase",
y="has_churned",
data=churn,
ci=None,
logistic=True)
sns.scatterplot(x="time_since_last_purchase",
y="has_churned",
data=prediction_data,
color="red")
plt.show()
prediction_data = explanatory_data.assign( has_churned = mdl_recency.predict(explanatory_data))
prediction_data["most_likely_outcome"] = np.round(prediction_data["has_churned"])
sns.regplot(x="time_since_last_purchase",
y="has_churned",
data=churn,
ci=None,
logistic=True)
sns.scatterplot(x="time_since_last_purchase",
y="most_likely_outcome",
data=prediction_data,
color="red")
plt.show()
Odds ratio is the probability of something happening divided by the probability that it doesn't.
$$ \text{odds\_ratio} = \frac{\text{probability}}{(1 - \text{probability)}} $$
$$ \text{odds\_ratio} = \frac{0.25}{(1 - 0.25)} = \frac{1}{3} $$
prediction_data["odds_ratio"] = prediction_data["has_churned"] /
(1 - prediction_data["has_churned"])
sns.lineplot(x="time_since_last_purchase", y="odds_ratio", data=prediction_data)
plt.axhline(y=1, linestyle="dotted")
plt.show()
sns.lineplot(x="time_since_last_purchase",
y="odds_ratio",
data=prediction_data)
plt.axhline(y=1,
linestyle="dotted")
plt.yscale("log")
plt.show()
prediction_data["log_odds_ratio"] = np.log(prediction_data["odds_ratio"])
time_since_last_prchs | has_churned | most_likely_rspns | odds_ratio | log_odds_ratio |
---|---|---|---|---|
0 | 0.491 | 0 | 0.966 | -0.035 |
2 | 0.623 | 1 | 1.654 | 0.503 |
4 | 0.739 | 1 | 2.834 | 1.042 |
6 | 0.829 | 1 | 4.856 | 1.580 |
... | ... | ... | ... | ... |
Scale | Are values easy to interpret? | Are changes easy to interpret? | Is precise? |
---|---|---|---|
Probability | ✔ | ✘ | ✔ |
Most likely outcome | ✔✔ | ✔ | ✘ |
Odds ratio | ✔ | ✘ | ✔ |
Log odds ratio | ✘ | ✔ | ✔ |
Introduction to Regression with statsmodels in Python