Model evaluation

Predicting CTR with Machine Learning in Python

Kevin Huo

Instructor

Precision and recall

Precision: ROI on ad spend through clicks
- Low precision means very little tangible ROI on clicks
Recall: targeting relevant audience
- Low recall means missed out opportunities on ROI
It may be sensible to weight the two differently
- Companies are likely to care more about avoiding low precision compared to low recall

$$F_\beta = (1+\beta^2)\cdot\frac{\text{precision}\cdot\text{recall}}{(\beta^2 \cdot \text{precision}) + \text{recall}}$$

Beta coefficient: represents relative weighting of two metrics
- Beta between 0 and 1 means precision is made smaller and hence weighted more, whereas beta > 1 means precision is made larger and hence weighted less
Implementation available in sklearn via: fbeta_score(y_true, y_pred, beta)
- y_true is true targets and y_pred the predicted targets

roc_auc = roc_auc_score(y_test, y_score[:, 1])

fpr = 1 - tn / (tn + fp)
precision = tp / (tp + fp)

fpr = 1 - 100 / (100 + 10) = 0.091
precision = tp / (tp + fp) = 0.5

Low FPR can lead to high AUC of ROC curve, despite precision being low! Therefore it is important to look at both metrics, along with F-beta score

total_return = tp * r

total_spent = (tp + fp) * cost

roi = total_return / total_spent 
    = (tp) / (tp + fp) * (r / cost) 
    = precision * (r / cost)

Predicting CTR with Machine Learning in Python