RandomizedSearchCV

Model Validation in Python

Kasey Jones

Data Scientist

Grid searching hyperparameters

When selecting values from multiple hyperparameters, the possible options create a grid. This grid is called the hyperparameter space.

Grid searching continued

Benefits:

Tests every possible combination

Drawbacks:

Additional hyperparameters increase training time exponentially

Better methods

Random search

from sklearn.model_selection import RandomizedSearchCV

random_search = RandomizedSearchCV()

Parameter Distribution:

param_dist = {"max_depth": [4, 6, 8, None],
              "max_features": range(2, 11),
              "min_samples_split": range(2, 11)}

Random search parameters

Parameters:

estimator: the model to use
param_distributions: dictionary containing hyperparameters and possible values
n_iter: number of iterations
scoring: scoring method to use

Setting RandomizedSearchCV parameters

param_dist = {"max_depth": [4, 6, 8, None],
              "max_features": range(2, 11),
              "min_samples_split": range(2, 11)}

from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import make_scorer, mean_absolute_error

rfr = RandomForestRegressor(n_estimators=20, random_state=1111)
scorer = make_scorer(mean_absolute_error)

RandomizedSearchCV implemented

Setting up the random search:

random_search =\
    RandomizedSearchCV(estimator=rfr,
                       param_distributions=param_dist,
                       n_iter=40,
                       cv=5)

We cannot do hyperparameter tuning without understanding model validation
Model validation allows us to compare multiple models and parameter sets

RandomizedSearchCV implemented

Setting up the random search:

random_search =\
    RandomizedSearchCV(estimator=rfr,
                       param_distributions=param_dist,
                       n_iter=40,
                       cv=5)

Complete the random search:

random_search.fit(X, y)

Let's explore some examples!

Model Validation in Python