Introduction to hyperparameter tuning

Model Validation in Python

Kasey Jones

Data Scientist

Model parameters

Parameters are:

Learned or estimated from the data
The result of fitting a model
Used when making future predictions
Not manually set

Linear regression parameters

Parameters are created by fitting a model:

from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(X, y)
print(lr.coef_, lr.intercept_)

[[0.798, 0.452]] [1.786]

Linear regression parameters

Parameters do not exist before the model is fit:

lr = LinearRegression()
print(lr.coef_, lr.intercept_)

AttributeError: 'LinearRegression' object has no attribute 'coef_'

Model hyperparameters

Hyperparameters:

Manually set before the training occurs
Specify how the training is supposed to happen

Random forest hyperparameters

Hyperparameter	Description	Possible Values (default)
n_estimators	Number of decision trees in the forest	2+ (10)
max_depth	Maximum depth of the decision trees	2+ (None)
max_features	Number of features to consider when making a split	See documentation
min_samples_split	The minimum number of samples required to make a split	2+ (2)

What is hyperparameter tuning?

Hyperparameter tuning:

Select hyperparameters
Run a single model type at different value sets
Create ranges of possible values to select from
Specify a single accuracy metric

Specifying ranges

depth = [4, 6, 8, 10, 12]
samples = [2, 4, 6, 8]
features = [2, 4, 6, 8, 10]

# Specify hyperparameters
rfc = RandomForestRegressor(
    n_estimators=100, max_depth=depth[0],
    min_samples_split=samples[3], max_features=features[1])

rfr.get_params()

{'bootstrap': True,
 'criterion': 'mse'
 ...
}

Too many hyperparameters!

rfr.get_params()

{'bootstrap': True,
 'criterion': 'mse',
 'max_depth': 4,
 'max_features': 4,
 'max_leaf_nodes': None,
 'min_impurity_decrease': 0.0,
 'min_impurity_split': None,
 'min_samples_leaf': 1,
 'min_samples_split': 8,
 ...
 }

General guidelines

Start with the basics
Read through the documentation
Test practical ranges

Let's practice!

Model Validation in Python