Intermediate Regression with statsmodels in Python
Maarten Van den Broeck
Content Developer at DataCamp
has_churned | time_since_first_purchase | time_since_last_purchase |
---|---|---|
0 | 0.3993247 | -0.5158691 |
1 | -0.4297957 | 0.6780654 |
0 | 3.7383122 | 0.4082544 |
0 | 0.6032289 | -0.6990435 |
... | ... | ... |
response | length of relationship | recency of activity |
from statsmodels.formula.api import logit
logit("response ~ explanatory", data=dataset).fit()
logit("response ~ explanatory1 + explanatory2", data=dataset).fit()
logit("response ~ explanatory1 * explanatory2", data=dataset).fit()
predicted false | predicted true | |
---|---|---|
actual false | correct | false positive |
actual true | false negative | correct |
conf_matrix = mdl_logit.pred_table()
print(conf_matrix)
[[102. 98.]
[ 53. 147.]]
from itertools import product explanatory1 = some_values explanatory2 = some_values p = product(explanatory1, explanatory2)
explanatory_data = pd.DataFrame(p, columns=["explanatory1", "explanatory2"])
prediction_data = explanatory_data.assign( mass_g = mdl_logit.predict(explanatory_data))
prediction_data["most_likely_outcome"] = np.round(prediction_data["has_churned"])
sns.scatterplot(... data=churn, hue="has_churned", ...) sns.scatterplot(... data=prediction_data, hue="most_likely_outcome", ...)
Intermediate Regression with statsmodels in Python