Regression models

Validazione dei modelli in Python

Kasey Jones

Data Scientist

Random forests in scikit-learn

from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import RandomForestClassifier
rfr = RandomForestRegressor(random_state=1111)
rfc = RandomForestClassifier(random_state=1111)
Validazione dei modelli in Python

Decision trees start with all points and break-off from the top based on different splits. Imagine following various paths across a tree's branches for each data point, based on the characteristics of the data.

Validazione dei modelli in Python

When a random forest is completed, it will average the result of the individual decision trees for each data point to get the final prediction for that data point.

Validazione dei modelli in Python

Random forest parameters

n_estimators: the number of trees in the forest

max_depth: the maximum depth of the trees

random_state: random seed

from sklearn.ensemble import RandomForestRegressor
rfr = RandomForestRegressor(n_estimators=50, max_depth=10)
rfr = RandomForestRegressor(random_state=1111)
rfr.n_estimators = 50
rfr.max_depth = 10
Validazione dei modelli in Python

Feature importance

Print how important each column is to the model

for i, item in enumerate(rfr.feature_importances_):
    print("{0:s}: {1:.2f}".format(X.columns[i], item))
weight: 0.50
height: 0.39
left_handed: 0.72
union_preference: 0.05
eye_color: 0.03
Validazione dei modelli in Python

Let's begin

Validazione dei modelli in Python

Preparing Video For Download...