Selecting your final model

Validazione dei modelli in Python

Kasey Jones

Data Scientist

# Best Score
rs.best_score_

5.45

# Best Parameters
rs.best_params_

{'max_depth': 4, 'max_features': 8, 'min_samples_split': 4}

# Best Estimator
rs.best_estimator_

Other attributes

rs.cv_results_

rs.cv_results_['mean_test_score']

array([5.45, 6.23, 5.87, 5,91, 5,67])

# Selected Parameters:
rs.cv_results_['params']

[{'max_depth': 10, 'min_samples_split': 8, 'n_estimators': 25},
 {'max_depth': 4, 'min_samples_split': 8, 'n_estimators': 50},
 ...]

Using .cv_results_

Group the max depths:

max_depth = [item['max_depth'] for item in rs.cv_results_['params']]
scores = list(rs.cv_results_['mean_test_score'])
d = pd.DataFrame([max_depth, scores]).T
d.columns = ['Max Depth', 'Score']
d.groupby(['Max Depth']).mean()

Max Depth  Score        
2.0        0.677928
4.0        0.753021
6.0        0.817219
8.0        0.879136
10.0       0.896821

Other attributes continued

Uses of the output:

Visualize the effect of each parameter
Make inferences on which parameters have big impacts on the results

Max Depth  Score        
2.0        0.677928
4.0        0.753021
6.0        0.817219
8.0        0.879136
10.0       0.896821

Selecting the best model

rs.best_estimator_ contains the information of the best model

rs.best_estimator_

RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=8,
           max_features=8, max_leaf_nodes=None, min_impurity_decrease=0.0,
           min_impurity_split=None, min_samples_leaf=1,
           min_samples_split=12, min_weight_fraction_leaf=0.0,
           n_estimators=20, n_jobs=1, oob_score=False, random_state=1111,
           verbose=0, warm_start=False)

Comparing types of models

Random forest:

rfr.score(X_test, y_test)

6.39

Gradient Boosting:

gb.score(X_test, y_test)

6.23

Using .best_estimator_

Predict new data:

rs.best_estimator_.predict(<new_data>)

Check the parameters:

random_search.best_estimator_.get_params()

Save model for use later:

from sklearn.externals import joblib

joblib.dump(rfr, 'rfr_best_<date>.pkl')

Let's practice!

Validazione dei modelli in Python