Tuning models

Predicting CTR with Machine Learning in Python

Kevin Huo

Instructor

Regularization

Regularization example with blue and green line

Regularization: addressing overfitting by altering the magnitude of coefficients of parameters within a model
Regularization can increase performance metrics and hence ROI on ad spend

Examples of regularization

Logistic Regression: the C parameter is the inverse of the regularization strength.
From least to most complex: C=0.05 < C=0.5 < C=1
Decision Tree: the max_depth parameter controls how many layers deep the tree can grow.
From least to most complex: max_depth=3 < max_depth=5 < max_depth=10

Cross validation

K-fold cross validation

For each of the k folds, that fold will be used as a testing set (for validation) while other k-1 are used as training.
Therefore, you have k evaluations of model performance.
Note you still have the separate evaluation testing set.

Examples of cross validation

k_fold = KFold(n_splits = 4, random_state = 0, shuffle = True)

for i in [3, 5, 10]:
  clf = DecisionTreeClassifier(max_depth = i)
  cv_precision = cross_val_score(
    clf, X_train, y_train, cv = k_fold, 
    scoring = 'precision_weighted')

Scoring strings: precision_weighted, recall_weighted, roc_auc

Let's practice!

Predicting CTR with Machine Learning in Python