Hyperparameter Tuning in Python
Alex Scriven
Data Scientist
In genetic evolution in the real world, we have the following process:
We can apply the same idea to hyperparameter tuning:
This is an informed search that has a number of advantages:
A useful library for genetic hyperparameter tuning is TPOT:
Consider TPOT your Data Science Assistant. TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
Pipelines not only include the model (or multiple models) but also work on features and other aspects of the process. Plus it returns the Python code of the pipeline for you!
The key arguments to a TPOT classifier are:
generations
: Iterations to run training for.population_size
: The number of models to keep after each iteration.offspring_size
: Number of models to produce in each iteration.mutation_rate
: The proportion of pipelines to apply randomness to.crossover_rate
: The proportion of pipelines to breed each iteration.scoring
: The function to determine the best models.cv
: Cross-validation strategy to use.A simple example:
from tpot import TPOTClassifier tpot = TPOTClassifier(generations=3, population_size=5, verbosity=2, offspring_size=10, scoring='accuracy', cv=5)
tpot.fit(X_train, y_train) print(tpot.score(X_test, y_test))
We will keep default values for mutation_rate
and crossover_rate
as they are best left to the default without deeper knowledge on genetic programming.
Notice: No algorithm-specific hyperparamaters?
Hyperparameter Tuning in Python