Predicting CTR with Machine Learning in Python
Kevin Huo
Instructor
is_student | loan | |
---|---|---|
middle_aged | 1 | |
youth | no | 0 |
youth | yes | 1 |
clf = DecisionTreeClassifier()
Similar to logistic regression, a decision tree also involves clf.fit(X_train, y_train)
for training data and clf.predict(X_test)
for testing labels:
array([0, 1, 1, ..., 1, 0, 1])
clf.predict_proba(X_test)
for probability scores:
array([0.2, 0.8], [0.4, 0.6] ..., [0.1, 0.9] [0.3, 0.7]])
Example for randomly splitting training and testing data, where testing data is 30% of total sample size: train_test_split(X, y, test_size = .3, random_state = 0)
Y_score = clf.predict_proba(X_test)
fpr, tpr, thresholds = roc_curve(Y_test, Y_score[:, 1])
roc_curve()
inputs: test and score arraysroc_auc = auc(fpr, tpr)
auc()
input: false-positive and true-positive arrays
If model is accurate and CTR is low, you may want to reassess how the ad message is relayed and what audience it is targeted for
Predicting CTR with Machine Learning in Python