Hyperparameter tuning in python

Ottimizzazione degli iperparametri in Python

Alex Scriven

Data Scientist

Introduction

 

Why study this course?

  • New, complex algorithms with many hyperparameters
  • Tuning can take a lot of time
  • Develops deeper understanding beyond the default settings

You may be surprised what you find under the hood!

Ottimizzazione degli iperparametri in Python

The dataset

 

The dataset relates to credit card defaults.

It contains variables related to the financial history of some consumers in Taiwan. It has 30,000 users and 24 attributes.

Our modeling target is whether they defaulted on their loan

It has already been preprocessed and at times we will take smaller samples to demonstrate a concept

Extra information about the dataset can be found here:

https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients

Ottimizzazione degli iperparametri in Python

Parameters Overview

 

What is a parameter?

  • Components of the model learned during the modeling process
  • You do not set these manually (you can't in fact!)
  • The algorithm will discover these for you
Ottimizzazione degli iperparametri in Python

Parameters in Logistic Regression

A simple logistic regression model:

log_reg_clf = LogisticRegression() 
log_reg_clf.fit(X_train, y_train)

print(log_reg_clf.coef_)
array([[-2.88651273e-06, -8.23168511e-03,  7.50857018e-04,
         3.94375060e-04,  3.79423562e-04,  4.34612046e-04,
         4.37561467e-04,  4.12107102e-04, -6.41089138e-06,
        -4.39364494e-06,  cont... ]])
Ottimizzazione degli iperparametri in Python

Parameters in Logistic Regression

Tidy up the coefficients:

# Get the original variable names
original_variables = list(X_train.columns)

# Zip together the names and coefficients zipped_together = list(zip(original_variables, log_reg_clf.coef_[0])) coefs = [list(x) for x in zipped_together]
# Put into a DataFrame with column labels coefs = pd.DataFrame(coefs, columns=["Variable", "Coefficient"])
Ottimizzazione degli iperparametri in Python

Parameters in Logistic Regression

 

Now sort and print the top three coefficients

coefs.sort_values(by=["Coefficient"], axis=0, inplace=True, ascending=False)
print(coefs.head(3))

coefficients table

Ottimizzazione degli iperparametri in Python

Where to find Parameters

 

To find parameters we need:

  1. To know a bit about the algorithm
  2. Consult the Scikit Learn documentation

Parameters will be found under the 'Attributes' section, not the 'parameters' section!

Ottimizzazione degli iperparametri in Python

Parameters in Random Forest

What about tree based algorithms?

Random forest has no coefficients, but node decisions (what feature and what value to split on).

# A simple random forest estimator
rf_clf = RandomForestClassifier(max_depth=2)
rf_clf.fit(X_train, y_train)

# Pull out one tree from the forest chosen_tree = rf_clf.estimators_[7]

For simplicity we will show the final product (an image) of the decision tree. Feel free to explore the package used for this (graphviz & pydotplus) yourself.

Ottimizzazione degli iperparametri in Python

Random Forest Single Tree decisions Flow Diagram

Ottimizzazione degli iperparametri in Python

Extracting Node Decisions

We can pull out details of the left, second-from-top node:

# Get the column it split on
split_column = chosen_tree.tree_.feature[1]
split_column_name = X_train.columns[split_column]

# Get the level it split on split_value = chosen_tree.tree_.threshold[1]
print("This node split on feature {}, at a value of {}" .format(split_column_name, split_value))

"This node split on feature PAY_0, at a value of 1.5"

Ottimizzazione degli iperparametri in Python

Let's practice!

Ottimizzazione degli iperparametri in Python

Preparing Video For Download...