Predict churn with logistic regression

Machine Learning for Marketing in Python

Karolis Urbonas

Head of Analytics & Science, Amazon

Introduction to logistic regression

  • Statistical classification model for binary responses
  • Models log-odds of the probability of the target
  • Assumes linear relationship between log-odds target and predictors
  • Returns coefficients and prediction probability

Logistic Regression Model

Machine Learning for Marketing in Python

Modeling steps

  1. Split data to training and testing
  2. Initialize the model
  3. Fit the model on the training data
  4. Predict values on the testing data
  5. Measure model performance on testing data
Machine Learning for Marketing in Python

Fitting the model

Import the Logistic Regression classifier

from sklearn.linear_model import LogisticRegression

Initialize Logistic Regression instance

logreg = LogisticRegression()

Fit the model on the training data

logreg.fit(train_X, train_Y)
Machine Learning for Marketing in Python

Model performance metrics

Key metrics:

  • Accuracy - The % of correctly predicted labels (both Churn and non Churn)
  • Precision - The % of total model's positive class predictions (here - predicted as Churn) that were correctly classified
  • Recall - The % of total positive class samples (all churned customers) that were correctly classified
Machine Learning for Marketing in Python

Measuring model accuracy

from sklearn.metrics import accuracy_score

pred_train_Y = logreg.predict(train_X) pred_test_Y = logreg.predict(test_X)
train_accuracy = accuracy_score(train_Y, pred_train_Y) test_accuracy = accuracy_score(test_Y, pred_test_Y)
print('Training accuracy:', round(train_accuracy,4)) print('Test accuracy:', round(test_accuracy, 4))
Training accuracy: 0.8108
Test accuracy: 0.8009
Machine Learning for Marketing in Python

Measuring precision and recall

from sklearn.metrics import precision_score, recall_score

train_precision = round(precision_score(train_Y, pred_train_Y), 4) test_precision = round(precision_score(test_Y, pred_test_Y), 4)
train_recall = round(recall_score(train_Y, pred_train_Y), 4) test_recall = round(recall_score(test_Y, pred_test_Y), 4)
print('Training precision: {}, Training recall: {}'.format(train_precision, train_recall)) print('Test precision: {}, Test recall: {}'.format(train_recall, test_recall))
Training precision: 0.6725, Training recall: 0.5736
Test precision: 0.5736, Test recall: 0.4835
Machine Learning for Marketing in Python

Regularization

  • Introduces penalty coefficient in the model building phase
  • Addresses over-fitting (when patterns are "memorized by the model")
  • Some regularization techniques also perform feature selection e.g. L1
  • Makes the model more generalizable to unseen samples
Machine Learning for Marketing in Python

L1 regularization and feature selection

  • LogisticRegression from sklearn performs L2 regularization by default
  • L1 regularization or also called LASSO can be called explicitly, and this approach performs feature selection by shrinking some of the model coefficients to zero.
from sklearn.linear_model import LogisticRegression
logreg = LogisticRegression(penalty='l1', C=0.1, solver='liblinear')
logreg.fit(train_X, train_Y)
  • C parameter needs to be tuned to find the optimal value
Machine Learning for Marketing in Python

Tuning L1 regularization

C = [1, .5, .25, .1, .05, .025, .01, .005, .0025]
l1_metrics = np.zeros((len(C), 5))
l1_metrics[:,0] = C

for index in range(0, len(C)): logreg = LogisticRegression(penalty='l1', C=C[index], solver='liblinear') logreg.fit(train_X, train_Y) pred_test_Y = logreg.predict(test_X)
l1_metrics[index,1] = np.count_nonzero(logreg.coef_) l1_metrics[index,2] = accuracy_score(test_Y, pred_test_Y) l1_metrics[index,3] = precision_score(test_Y, pred_test_Y) l1_metrics[index,4] = recall_score(test_Y, pred_test_Y)
col_names = ['C','Non-Zero Coeffs','Accuracy','Precision','Recall'] print(pd.DataFrame(l1_metrics, columns=col_names)
Machine Learning for Marketing in Python

Choosing optimal C value

L1 regularization C parameter tuning

Machine Learning for Marketing in Python

Choosing optimal C value

L1 regularization C parameter tuning

Machine Learning for Marketing in Python

Let's run some logistic regression models!

Machine Learning for Marketing in Python

Preparing Video For Download...