Informed Methods: Bayesian Statistics

Hyperparameter Tuning in Python

Alex Scriven

Data Scientist

Bayes Introduction

Bayes Rule:

A statistical method of using new evidence to iteratively update our beliefs about some outcome

Intuitively fits with the idea of informed search. Getting better as we get more evidence.

Bayes Rule

Bayes Rule has the form:

$$ P(A \mid B) = \frac{P(B \mid A) \, P(A)}{P(B)} $$

LHS = the probability of A, given B has occurred. B is some new evidence.
- This is known as the 'posterior'
RHS is how we calculate this.
P(A) is the 'prior'. The initial hypothesis about the event. It is different to P(A|B), the P(A|B) is the probability given new evidence.

Bayes Rule

$$ P(A \mid B) = \frac{P(B \mid A) \, P(A)}{P(B)} $$

P(B) is the 'marginal likelihood' and it is the probability of observing this new evidence
P(B|A) is the 'likelihood' which is the probability of observing the evidence, given the event we care about.

This all may be quite confusing, but let's use a common example of a medical diagnosis to demonstrate.

Bayes in Medicine

A medical example:

5% of people in the general population have a certain disease
- P(D)
10% of people are predisposed
- P(Pre)
20% of people with the disease are predisposed
- P(Pre|D)

Bayes in Medicine

What is the probability that any person has the disease?

$$ P(D) = 0.05 $$

This is simply our prior as we have no evidence.

What is the probability that a predisposed person has the disease?

$$ P(D \mid Pre) = \frac{P(Pre \mid D) \, P(D)}{P(pre)} $$

$$ P(D \mid Pre) = \frac{0.2 \, * 0.05}{0.1} = 0.1 $$

Bayes in Hyperparameter Tuning

We can apply this logic to hyperparameter tuning:

Pick a hyperparameter combination
Build a model
Get new evidence (the score of the model)
Update our beliefs and chose better hyperparameters next round

Bayesian hyperparameter tuning is very new but quite popular for larger and more complex hyperparameter tuning tasks as they work well to find optimal hyperparameter combinations in these situations

Bayesian Hyperparameter Tuning with Hyperopt

Introducing the Hyperopt package.

To undertake bayesian hyperparameter tuning we need to:

Set the Domain: Our Grid (with a bit of a twist)
Set the Optimization algorithm (use default TPE)
Objective function to minimize: we will use 1-Accuracy

Hyperopt: Set the Domain (grid)

Many options to set the grid:

Simple numbers
Choose from a list
Distribution of values

Hyperopt does not use point values on the grid but instead each point represents probabilities for each hyperparameter value.

We will do a simple uniform distribution but there are many more if you check the documentation.

The Domain

Set up the grid:

space = {
    'max_depth': hp.quniform('max_depth', 2, 10, 2),
    'min_samples_leaf': hp.quniform('min_samples_leaf', 2, 8, 2),
    'learning_rate': hp.uniform('learning_rate', 0.01, 1, 55),
}

The objective function

The objective function runs the algorithm:

def objective(params):
    params = {'max_depth': int(params['max_depth']),
        'min_samples_leaf': int(params['min_samples_leaf']),
        'learning_rate': params['learning_rate']}
    gbm_clf = GradientBoostingClassifier(n_estimators=500, **params)

    best_score = cross_val_score(gbm_clf, X_train, y_train,
                scoring='accuracy', cv=10, n_jobs=4).mean()
    loss = 1 - best_score

    write_results(best_score, params, iteration)
    return loss

Run the algorithm

Run the algorithm:

best_result = fmin(
            fn=objective,
            space=space,
            max_evals=500, 
            rstate=np.random.default_rng(42),
            algo=tpe.suggest)

Let's practice!

Hyperparameter Tuning in Python