Informed Methods: Genetic Algorithms

Hyperparameter Tuning in Python

Alex Scriven

Data Scientist

A lesson on genetics

In genetic evolution in the real world, we have the following process:

There are many creatures existing ('offspring')
The strongest creatures survive and pair off
There is some 'crossover' as they form offspring
There are random mutations to some of the offspring
- These mutations sometimes help give some offspring an advantage
Go back to (1)!

Genetics in Machine Learning

We can apply the same idea to hyperparameter tuning:

We can create some models (that have hyperparameter settings)
We can pick the best (by our scoring function)
- These are the ones that 'survive'
We can create new models that are similar to the best ones
We add in some randomness so we don't reach a local optimum
Repeat until we are happy!

Why does this work well?

This is an informed search that has a number of advantages:

It allows us to learn from previous iterations, just like bayesian hyperparameter tuning.
It has the additional advantage of some randomness
(The package we'll use) takes care of many tedious aspects of machine learning

Introducing TPOT

A useful library for genetic hyperparameter tuning is TPOT:

Consider TPOT your Data Science Assistant. TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Pipelines not only include the model (or multiple models) but also work on features and other aspects of the process. Plus it returns the Python code of the pipeline for you!

TPOT components

The key arguments to a TPOT classifier are:

generations: Iterations to run training for.
population_size: The number of models to keep after each iteration.
offspring_size: Number of models to produce in each iteration.
mutation_rate: The proportion of pipelines to apply randomness to.
crossover_rate: The proportion of pipelines to breed each iteration.
scoring: The function to determine the best models.
cv: Cross-validation strategy to use.

A simple example

A simple example:

from tpot import TPOTClassifier
tpot = TPOTClassifier(generations=3, population_size=5, 
                      verbosity=2, offspring_size=10,
                      scoring='accuracy', cv=5)

tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))

We will keep default values for mutation_rate and crossover_rate as they are best left to the default without deeper knowledge on genetic programming.

Notice: No algorithm-specific hyperparamaters?

Let's practice!

Hyperparameter Tuning in Python