Informed Methods: Genetic Algorithms

Hyperparameter Tuning in Python

Alex Scriven

Data Scientist

A lesson on genetics

 

In genetic evolution in the real world, we have the following process:

  1. There are many creatures existing ('offspring')
  2. The strongest creatures survive and pair off
  3. There is some 'crossover' as they form offspring
  4. There are random mutations to some of the offspring
    • These mutations sometimes help give some offspring an advantage
  5. Go back to (1)!
Hyperparameter Tuning in Python

Genetics in Machine Learning

 

We can apply the same idea to hyperparameter tuning:

  1. We can create some models (that have hyperparameter settings)
  2. We can pick the best (by our scoring function)
    • These are the ones that 'survive'
  3. We can create new models that are similar to the best ones
  4. We add in some randomness so we don't reach a local optimum
  5. Repeat until we are happy!
Hyperparameter Tuning in Python

Why does this work well?

 

This is an informed search that has a number of advantages:

  • It allows us to learn from previous iterations, just like bayesian hyperparameter tuning.
  • It has the additional advantage of some randomness
  • (The package we'll use) takes care of many tedious aspects of machine learning
Hyperparameter Tuning in Python

Introducing TPOT

 

A useful library for genetic hyperparameter tuning is TPOT:

Consider TPOT your Data Science Assistant. TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Pipelines not only include the model (or multiple models) but also work on features and other aspects of the process. Plus it returns the Python code of the pipeline for you!

Hyperparameter Tuning in Python

TPOT components

The key arguments to a TPOT classifier are:

  • generations: Iterations to run training for.
  • population_size: The number of models to keep after each iteration.
  • offspring_size: Number of models to produce in each iteration.
  • mutation_rate: The proportion of pipelines to apply randomness to.
  • crossover_rate: The proportion of pipelines to breed each iteration.
  • scoring: The function to determine the best models.
  • cv: Cross-validation strategy to use.
Hyperparameter Tuning in Python

A simple example

A simple example:

from tpot import TPOTClassifier
tpot = TPOTClassifier(generations=3, population_size=5, 
                      verbosity=2, offspring_size=10,
                      scoring='accuracy', cv=5)

tpot.fit(X_train, y_train) print(tpot.score(X_test, y_test))

We will keep default values for mutation_rate and crossover_rate as they are best left to the default without deeper knowledge on genetic programming.

Notice: No algorithm-specific hyperparamaters?

Hyperparameter Tuning in Python

Let's practice!

Hyperparameter Tuning in Python

Preparing Video For Download...