Why you need logistic regression

Introduction to Regression with statsmodels in Python

Maarten Van den Broeck

Content Developer at DataCamp

Bank churn dataset

has_churned time_since_first_purchase time_since_last_purchase
0 0.3993247 -0.5158691
1 -0.4297957 0.6780654
0 3.7383122 0.4082544
0 0.6032289 -0.6990435
... ... ...
response length of relationship recency of activity
1 https://www.rdocumentation.org/packages/bayesQR/topics/Churn
Introduction to Regression with statsmodels in Python

Churn vs. recency: a linear model

mdl_churn_vs_recency_lm = ols("has_churned ~ time_since_last_purchase",
                              data=churn).fit()

print(mdl_churn_vs_recency_lm.params)
Intercept                   0.490780
time_since_last_purchase    0.063783
dtype: float64
intercept, slope = mdl_churn_vs_recency_lm.params
Introduction to Regression with statsmodels in Python

Visualizing the linear model

sns.scatterplot(x="time_since_last_purchase",
                y="has_churned",
                data=churn)

plt.axline(xy1=(0, intercept), slope=slope) plt.show()

A scatter plot of whether or not the customer churned versus time since last purchase. All the points are at the line y equals 0 or y equals 1. A linear trend line shows the probability of churning increasing as time since last purchase increases.

Introduction to Regression with statsmodels in Python

Zooming out

sns.scatterplot(x="time_since_last_purchase",
                y="has_churned",
                data=churn)

plt.axline(xy1=(0,intercept),
           slope=slope)

plt.xlim(-10, 10) plt.ylim(-0.2, 1.2)
plt.show()

The scatter plot of whether or not the customer churned versus time since last purchase. The axes are zoomed out compared to last time, showing that the trend line extends below y equals 0 and above y equals 1, which ought to be impossible.

Introduction to Regression with statsmodels in Python

What is logistic regression?

  • Another type of generalized linear model.
  • Used when the response variable is logical.
  • The responses follow logistic (S-shaped) curve.
Introduction to Regression with statsmodels in Python

Logistic regression using logit()

from statsmodels.formula.api import logit
mdl_churn_vs_recency_logit = logit("has_churned ~ time_since_last_purchase",
                                   data=churn).fit()

print(mdl_churn_vs_recency_logit.params)
Intercept                  -0.035019
time_since_last_purchase    0.269215
dtype: float64
Introduction to Regression with statsmodels in Python

Visualizing the logistic model

sns.regplot(x="time_since_last_purchase",
            y="has_churned",
            data=churn,
            ci=None,
            logistic=True)
plt.axline(xy1=(0,intercept),
           slope=slope,
           color="black")

plt.show()

A scatter plot of whether or not the customer churned versus time since last purchase. Linear and logistic trend lines are shown, and both show increasing churn probabilities as time since last purchase increases. The two trend lines track each other quite closely except at high time since last purchase.

Introduction to Regression with statsmodels in Python

Zooming out

The scatter plot of whether or not the customer churned versus time since last purchase, with both trend lines. The axes are zoomed out compared to last time, showing that the logistic trend line never goes outside the zero to one churn range.

Introduction to Regression with statsmodels in Python

Let's practice!

Introduction to Regression with statsmodels in Python

Preparing Video For Download...