Predict churn with decision trees

Machine Learning for Marketing in Python

Karolis Urbonas

Head of Analytics & Science, Amazon

Introduction to decision trees

Decision Tree rules on Titanic Survival dataset

Machine Learning for Marketing in Python

Modeling steps

  1. Split data to training and testing
  2. Initialize the model
  3. Fit the model on the training data
  4. Predict values on the testing data
  5. Measure model performance on testing data
Machine Learning for Marketing in Python

Fitting the model

Import the decision tree module

from sklearn.tree import DecisionTreeClassifier

Initialize the Decision Tree model

mytree = DecisionTreeClassifier()

Fit the model on the training data

treemodel = mytree.fit(train_X, train_Y)
Machine Learning for Marketing in Python

Measuring model accuracy

from sklearn.metrics import accuracy_score

pred_train_Y = mytree.predict(train_X) pred_test_Y = mytree.predict(test_X)
train_accuracy = accuracy_score(train_Y, pred_train_Y) test_accuracy = accuracy_score(test_Y, pred_test_Y)
print('Training accuracy:', round(train_accuracy,4)) print('Test accuracy:', round(test_accuracy, 4))
Training accuracy: 0.9973
Test accuracy: 0.7196
Machine Learning for Marketing in Python

Measuring precision and recall

from sklearn.metrics import precision_score, recall_score

train_precision = round(precision_score(train_Y, pred_train_Y), 4) test_precision = round(precision_score(test_Y, pred_test_Y), 4)
train_recall = round(recall_score(train_Y, pred_train_Y), 4) test_recall = round(recall_score(test_Y, pred_test_Y), 4)
print('Training precision: {}, Training recall: {}'.format(train_precision, train_recall)) print('Test precision: {}, Test recall: {}'.format(train_recall, test_recall))
Training precision: 0.9993, Training recall: 0.9906
Test precision: 0.9906, Test recall: 0.4878
Machine Learning for Marketing in Python

Tree depth parameter tuning

depth_list = list(range(2,15))
depth_tuning = np.zeros((len(depth_list), 4))
depth_tuning[:,0] = depth_list

for index in range(len(depth_list)): mytree = DecisionTreeClassifier(max_depth=depth_list[index]) mytree.fit(train_X, train_Y) pred_test_Y = mytree.predict(test_X)
depth_tuning[index,1] = accuracy_score(test_Y, pred_test_Y) depth_tuning[index,2] = precision_score(test_Y, pred_test_Y) depth_tuning[index,3] = recall_score(test_Y, pred_test_Y)
col_names = ['Max_Depth','Accuracy','Precision','Recall'] print(pd.DataFrame(depth_tuning, columns=col_names))
Machine Learning for Marketing in Python

Choosing optimal depth

Max Depth tuning

Machine Learning for Marketing in Python

Choosing optimal depth

Max Depth tuning

Machine Learning for Marketing in Python

Let's build a decision tree!

Machine Learning for Marketing in Python

Preparing Video For Download...