Hyperparameter Tuning in Python
Alex Scriven
Data Scientist
Very similar to grid search:
BUT we instead randomly select grid squares.
Bengio & Bergstra (2012):
This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid.
Two main reasons:
A grid search:
How many models must we run to have a 95% chance of getting one of the green squares?
Our best models:
If we randomly select hyperparameter combinations uniformly, let's consider the chance of MISSING every single trial, to show how unlikely that is
Trial 1 = 0.05 chance of success and (1 - 0.05) of missing
In fact, with n trials we have (1-0.05)^n chance that every single trial misses that desired spot.
So how many trials to have a high (95%) chance of getting in that region?
What does that all mean?
Remember:
The maximum is still only as good as the grid you set!
Remember to fairly compare this to grid search, you need to have the same modeling 'budget'
We can create our own random sample of hyperparameter combinations:
# Set some hyperparameter lists
learn_rate_list = np.linspace(0.001,2,150)
min_samples_leaf_list = list(range(1,51))
# Create list of combinations
from itertools import product
combinations_list = [list(x) for x in
product(learn_rate_list, min_samples_leaf_list)]
# Select 100 models from our larger set
random_combinations_index = np.random.choice(
range(0,len(combinations_list)), 100,
replace=False)
combinations_random_chosen = [combinations_list[x] for x in
random_combinations_index]
We can also visualize the random search coverage by plotting the hyperparameter choices on an X and Y axis.
Notice how this has a wide range of the scatter but not deep coverage?
Hyperparameter Tuning in Python