CTR prediction using decision trees

Predicting CTR with Machine Learning in Python

Kevin Huo

Instructor

Decision trees

Decision tree example with deciding credit loans based on age and student status

  • Nodes represent the features
  • Branches represent the decisions based on features
  • Sample outcomes are shown in table below:
  • First split is based on age of application
  • For youth group, second split is based on student status
  • Model provides heuristics for understanding
is_student loan
middle_aged 1
youth no 0
youth yes 1
Predicting CTR with Machine Learning in Python

Training and testing the model

  • Create via: clf = DecisionTreeClassifier()
  • Similar to logistic regression, a decision tree also involves clf.fit(X_train, y_train) for training data and clf.predict(X_test) for testing labels:

    array([0, 1, 1, ..., 1, 0, 1])
    
  • clf.predict_proba(X_test) for probability scores:

    array([0.2, 0.8], [0.4, 0.6] ..., [0.1, 0.9] [0.3, 0.7]])
    
  • Example for randomly splitting training and testing data, where testing data is 30% of total sample size: train_test_split(X, y, test_size = .3, random_state = 0)

Predicting CTR with Machine Learning in Python

Evaluation with ROC curve

Example of area under the ROC curve for a classifier

  • True positive rate (Y-axis) = #(classifier predicts positive, actually positive) / #(positives)
  • False positive rate (X-axis) = #(classifier predicts positive, actually negative) / #(negatives)
  • Dotted blue line: baseline AUC of 0.5
  • Want orange line (AUC) to be as close to 1 as possible
Predicting CTR with Machine Learning in Python

AUC of ROC curve

Y_score = clf.predict_proba(X_test)
fpr, tpr, thresholds = roc_curve(Y_test, Y_score[:, 1])
  • roc_curve() inputs: test and score arrays
roc_auc = auc(fpr, tpr)
  • auc() input: false-positive and true-positive arrays

  • If model is accurate and CTR is low, you may want to reassess how the ad message is relayed and what audience it is targeted for

Predicting CTR with Machine Learning in Python

Let's practice!

Predicting CTR with Machine Learning in Python

Preparing Video For Download...