Hyperparameter tuning

Supervised Learning with scikit-learn

George Boorman

Core Curriculum Manager

Hyperparameter tuning

Ridge/lasso regression: Choosing alpha
KNN: Choosing n_neighbors
Hyperparameters: Parameters we specify before fitting the model
- Like alpha and n_neighbors

Choosing the correct hyperparameters

Try lots of different hyperparameter values
Fit all of them separately
See how well they perform
Choose the best performing values

This is called hyperparameter tuning
It is essential to use cross-validation to avoid overfitting to the test set
We can still split the data and perform cross-validation on the training set
We withhold the test set for final evaluation

Grid search cross-validation

grid of potential values of n neighbors from 2 to 11 in increments of 3, and metric options of euclidean or manhattan

Grid search cross-validation

k-fold cross validation scores for each combination of hyperparameters in the grid

Grid search cross-validation

5 neighbors and euclidean metric highlight, with score of 0.8748

GridSearchCV in scikit-learn

from sklearn.model_selection import GridSearchCV

kf = KFold(n_splits=5, shuffle=True, random_state=42)

param_grid = {"alpha": np.arange(0.0001, 1, 10),
              "solver": ["sag", "lsqr"]}

ridge = Ridge()

ridge_cv = GridSearchCV(ridge, param_grid, cv=kf)

ridge_cv.fit(X_train, y_train)

print(ridge_cv.best_params_, ridge_cv.best_score_)

{'alpha': 0.0001, 'solver': 'sag'}
0.7529912278705785

Limitations and an alternative approach

3-fold cross-validation, 1 hyperparameter, 10 total values = 30 fits
10 fold cross-validation, 3 hyperparameters, 30 total values = 900 fits

RandomizedSearchCV

from sklearn.model_selection import RandomizedSearchCV

kf = KFold(n_splits=5, shuffle=True, random_state=42)
param_grid = {'alpha': np.arange(0.0001, 1, 10),
              "solver": ['sag', 'lsqr']}
ridge = Ridge()

ridge_cv = RandomizedSearchCV(ridge, param_grid, cv=kf, n_iter=2)
ridge_cv.fit(X_train, y_train)

print(ridge_cv.best_params_, ridge_cv.best_score_)

{'solver': 'sag', 'alpha': 0.0001}
0.7529912278705785

Evaluating on the test set

test_score = ridge_cv.score(X_test, y_test)

print(test_score)

0.7564731534089224

Let's practice!

Supervised Learning with scikit-learn