Multiple logistic regression

Intermediate Regression with statsmodels in Python

Maarten Van den Broeck

Content Developer at DataCamp

Bank churn dataset

has_churned time_since_first_purchase time_since_last_purchase
0 0.3993247 -0.5158691
1 -0.4297957 0.6780654
0 3.7383122 0.4082544
0 0.6032289 -0.6990435
... ... ...
response length of relationship recency of activity
1 https://www.rdocumentation.org/packages/bayesQR/topics/Churn
Intermediate Regression with statsmodels in Python

logit()

from statsmodels.formula.api import logit

logit("response ~ explanatory", data=dataset).fit()
logit("response ~ explanatory1 + explanatory2", data=dataset).fit()
logit("response ~ explanatory1 * explanatory2", data=dataset).fit()
Intermediate Regression with statsmodels in Python

The four outcomes

predicted false predicted true
actual false correct false positive
actual true false negative correct
conf_matrix = mdl_logit.pred_table()
print(conf_matrix)
[[102.  98.]
 [ 53. 147.]]
Intermediate Regression with statsmodels in Python

Prediction flow

from itertools import product

explanatory1 = some_values
explanatory2 = some_values

p = product(explanatory1, explanatory2)

explanatory_data = pd.DataFrame(p, columns=["explanatory1", "explanatory2"])
prediction_data = explanatory_data.assign( mass_g = mdl_logit.predict(explanatory_data))
Intermediate Regression with statsmodels in Python

Visualization

prediction_data["most_likely_outcome"] = np.round(prediction_data["has_churned"])

sns.scatterplot(... data=churn, hue="has_churned", ...) sns.scatterplot(... data=prediction_data, hue="most_likely_outcome", ...)
Intermediate Regression with statsmodels in Python

Let's practice!

Intermediate Regression with statsmodels in Python

Preparing Video For Download...