Selecting your final model

Model Validation in Python

Kasey Jones

Data Scientist

# Best Score
rs.best_score_
5.45
# Best Parameters
rs.best_params_
{'max_depth': 4, 'max_features': 8, 'min_samples_split': 4}
# Best Estimator
rs.best_estimator_
Model Validation in Python

Other attributes

rs.cv_results_

rs.cv_results_['mean_test_score']
array([5.45, 6.23, 5.87, 5,91, 5,67])
# Selected Parameters:
rs.cv_results_['params']
[{'max_depth': 10, 'min_samples_split': 8, 'n_estimators': 25},
 {'max_depth': 4, 'min_samples_split': 8, 'n_estimators': 50},
 ...]
Model Validation in Python

Using .cv_results_

Group the max depths:

max_depth = [item['max_depth'] for item in rs.cv_results_['params']]
scores = list(rs.cv_results_['mean_test_score'])
d = pd.DataFrame([max_depth, scores]).T
d.columns = ['Max Depth', 'Score']
d.groupby(['Max Depth']).mean()
Max Depth  Score        
2.0        0.677928
4.0        0.753021
6.0        0.817219
8.0        0.879136
10.0       0.896821
Model Validation in Python

Other attributes continued

Uses of the output:

  • Visualize the effect of each parameter
  • Make inferences on which parameters have big impacts on the results
Max Depth  Score        
2.0        0.677928
4.0        0.753021
6.0        0.817219
8.0        0.879136
10.0       0.896821
Model Validation in Python

Selecting the best model

rs.best_estimator_ contains the information of the best model

rs.best_estimator_
RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=8,
           max_features=8, max_leaf_nodes=None, min_impurity_decrease=0.0,
           min_impurity_split=None, min_samples_leaf=1,
           min_samples_split=12, min_weight_fraction_leaf=0.0,
           n_estimators=20, n_jobs=1, oob_score=False, random_state=1111,
           verbose=0, warm_start=False)
Model Validation in Python

Comparing types of models

Random forest:

rfr.score(X_test, y_test)
6.39

Gradient Boosting:

gb.score(X_test, y_test)
6.23
Model Validation in Python

Using .best_estimator_

Predict new data:

rs.best_estimator_.predict(<new_data>)

Check the parameters:

random_search.best_estimator_.get_params()

Save model for use later:

from sklearn.externals import joblib

joblib.dump(rfr, 'rfr_best_<date>.pkl')
Model Validation in Python

Let's practice!

Model Validation in Python

Preparing Video For Download...