Selecting your final model

Validazione dei modelli in Python

Kasey Jones

Data Scientist

# Best Score
rs.best_score_
5.45
# Best Parameters
rs.best_params_
{'max_depth': 4, 'max_features': 8, 'min_samples_split': 4}
# Best Estimator
rs.best_estimator_
Validazione dei modelli in Python

Other attributes

rs.cv_results_

rs.cv_results_['mean_test_score']
array([5.45, 6.23, 5.87, 5,91, 5,67])
# Selected Parameters:
rs.cv_results_['params']
[{'max_depth': 10, 'min_samples_split': 8, 'n_estimators': 25},
 {'max_depth': 4, 'min_samples_split': 8, 'n_estimators': 50},
 ...]
Validazione dei modelli in Python

Using .cv_results_

Group the max depths:

max_depth = [item['max_depth'] for item in rs.cv_results_['params']]
scores = list(rs.cv_results_['mean_test_score'])
d = pd.DataFrame([max_depth, scores]).T
d.columns = ['Max Depth', 'Score']
d.groupby(['Max Depth']).mean()
Max Depth  Score        
2.0        0.677928
4.0        0.753021
6.0        0.817219
8.0        0.879136
10.0       0.896821
Validazione dei modelli in Python

Other attributes continued

Uses of the output:

  • Visualize the effect of each parameter
  • Make inferences on which parameters have big impacts on the results
Max Depth  Score        
2.0        0.677928
4.0        0.753021
6.0        0.817219
8.0        0.879136
10.0       0.896821
Validazione dei modelli in Python

Selecting the best model

rs.best_estimator_ contains the information of the best model

rs.best_estimator_
RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=8,
           max_features=8, max_leaf_nodes=None, min_impurity_decrease=0.0,
           min_impurity_split=None, min_samples_leaf=1,
           min_samples_split=12, min_weight_fraction_leaf=0.0,
           n_estimators=20, n_jobs=1, oob_score=False, random_state=1111,
           verbose=0, warm_start=False)
Validazione dei modelli in Python

Comparing types of models

Random forest:

rfr.score(X_test, y_test)
6.39

Gradient Boosting:

gb.score(X_test, y_test)
6.23
Validazione dei modelli in Python

Using .best_estimator_

Predict new data:

rs.best_estimator_.predict(<new_data>)

Check the parameters:

random_search.best_estimator_.get_params()

Save model for use later:

from sklearn.externals import joblib

joblib.dump(rfr, 'rfr_best_<date>.pkl')
Validazione dei modelli in Python

Let's practice!

Validazione dei modelli in Python

Preparing Video For Download...