Regression models

Model Validation in Python

Kasey Jones

Data Scientist

Random forests in scikit-learn

from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import RandomForestClassifier
rfr = RandomForestRegressor(random_state=1111)
rfc = RandomForestClassifier(random_state=1111)
Model Validation in Python

Decision trees start with all points and break-off from the top based on different splits. Imagine following various paths across a tree's branches for each data point, based on the characteristics of the data.

Model Validation in Python

When a random forest is completed, it will average the result of the individual decision trees for each data point to get the final prediction for that data point.

Model Validation in Python

Random forest parameters

n_estimators: the number of trees in the forest

max_depth: the maximum depth of the trees

random_state: random seed

from sklearn.ensemble import RandomForestRegressor
rfr = RandomForestRegressor(n_estimators=50, max_depth=10)
rfr = RandomForestRegressor(random_state=1111)
rfr.n_estimators = 50
rfr.max_depth = 10
Model Validation in Python

Feature importance

Print how important each column is to the model

for i, item in enumerate(rfr.feature_importances_):
    print("{0:s}: {1:.2f}".format(X.columns[i], item))
weight: 0.50
height: 0.39
left_handed: 0.72
union_preference: 0.05
eye_color: 0.03
Model Validation in Python

Let's begin

Model Validation in Python

Preparing Video For Download...