Hyperparameter Tuning in Python
Alex Scriven
Data Scientist
Introducing a GridSearchCV
object:
sklearn.model_selection.GridSearchCV(
estimator,
param_grid, scoring=None, fit_params=None,
n_jobs=None, refit=True, cv='warn',
verbose=0, pre_dispatch='2*n_jobs',
error_score='raise-deprecating',
return_train_score='warn')
Steps in a Grid Search:
The important inputs are:
estimator
param_grid
cv
scoring
refit
n_jobs
return_train_score
The estimator
input:
Remember:
The param_grid
input:
Rather than a list:
max_depth_list = [2, 4, 6, 8]
min_samples_leaf_list = [1, 2, 4, 6]
This would be:
param_grid = {'max_depth': [2, 4, 6, 8],
'min_samples_leaf': [1, 2, 4, 6]}
The param_grid
input:
Remember: The keys in your param_grid
dictionary must be valid hyperparameters.
For example, for a Logistic regression estimator:
# Incorrect
param_grid = {'C': [0.1,0.2,0.5],
'best_choice': [10,20,50]}
ValueError: Invalid parameter best_choice for estimator LogisticRegression
The cv
input:
The scoring
input:
metrics
moduleYou can check all the built in scoring functions this way:
from sklearn import metrics
sorted(metrics.SCORERS.keys())
The refit
input:
GridSearchCV
object to be used as an estimator (for prediction)The n_jobs
input:
Some handy code:
import os
print(os.cpu_count())
Careful using all your cores for modelling if you want to do other work!
The return_train_score
input:
Building our own GridSearchCV Object:
# Create the grid param_grid = {'max_depth': [2, 4, 6, 8], 'min_samples_leaf': [1, 2, 4, 6]}
#Get a base classifier with some set parameters. rf_class = RandomForestClassifier(criterion='entropy', max_features='auto')
Putting the pieces together:
grid_rf_class = GridSearchCV(
estimator = rf_class,
param_grid = parameter_grid,
scoring='accuracy',
n_jobs=4,
cv = 10,
refit=True,
return_train_score=True)
Because we set refit
to True
we can directly use the object:
#Fit the object to our data
grid_rf_class.fit(X_train, y_train)
# Make predictions
grid_rf_class.predict(X_test)
Hyperparameter Tuning in Python