CTR prediction using decision trees

Predicting CTR with Machine Learning in Python

Kevin Huo

Instructor

Decision trees

Decision tree example with deciding credit loans based on age and student status

Create via: clf = DecisionTreeClassifier()
Similar to logistic regression, a decision tree also involves clf.fit(X_train, y_train) for training data and clf.predict(X_test) for testing labels:
```
array([0, 1, 1, ..., 1, 0, 1])
```

clf.predict_proba(X_test) for probability scores:

array([0.2, 0.8], [0.4, 0.6] ..., [0.1, 0.9] [0.3, 0.7]])

Example for randomly splitting training and testing data, where testing data is 30% of total sample size: train_test_split(X, y, test_size = .3, random_state = 0)

Example of area under the ROC curve for a classifier

True positive rate (Y-axis) = #(classifier predicts positive, actually positive) / #(positives)
False positive rate (X-axis) = #(classifier predicts positive, actually negative) / #(negatives)
Dotted blue line: baseline AUC of 0.5
Want orange line (AUC) to be as close to 1 as possible

Y_score = clf.predict_proba(X_test)

fpr, tpr, thresholds = roc_curve(Y_test, Y_score[:, 1])

roc_auc = auc(fpr, tpr)

auc() input: false-positive and true-positive arrays
If model is accurate and CTR is low, you may want to reassess how the ad message is relayed and what audience it is targeted for

Predicting CTR with Machine Learning in Python