Model Validation in Python
Kasey Jones
Data Scientist
# Best Score
rs.best_score_
5.45
# Best Parameters
rs.best_params_
{'max_depth': 4, 'max_features': 8, 'min_samples_split': 4}
# Best Estimator
rs.best_estimator_
rs.cv_results_
rs.cv_results_['mean_test_score']
array([5.45, 6.23, 5.87, 5,91, 5,67])
# Selected Parameters:
rs.cv_results_['params']
[{'max_depth': 10, 'min_samples_split': 8, 'n_estimators': 25},
{'max_depth': 4, 'min_samples_split': 8, 'n_estimators': 50},
...]
Group the max depths:
max_depth = [item['max_depth'] for item in rs.cv_results_['params']]
scores = list(rs.cv_results_['mean_test_score'])
d = pd.DataFrame([max_depth, scores]).T
d.columns = ['Max Depth', 'Score']
d.groupby(['Max Depth']).mean()
Max Depth Score
2.0 0.677928
4.0 0.753021
6.0 0.817219
8.0 0.879136
10.0 0.896821
Uses of the output:
Max Depth Score
2.0 0.677928
4.0 0.753021
6.0 0.817219
8.0 0.879136
10.0 0.896821
rs.best_estimator_
contains the information of the best model
rs.best_estimator_
RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=8,
max_features=8, max_leaf_nodes=None, min_impurity_decrease=0.0,
min_impurity_split=None, min_samples_leaf=1,
min_samples_split=12, min_weight_fraction_leaf=0.0,
n_estimators=20, n_jobs=1, oob_score=False, random_state=1111,
verbose=0, warm_start=False)
Random forest:
rfr.score(X_test, y_test)
6.39
Gradient Boosting:
gb.score(X_test, y_test)
6.23
Predict new data:
rs.best_estimator_.predict(<new_data>)
Check the parameters:
random_search.best_estimator_.get_params()
Save model for use later:
from sklearn.externals import joblib
joblib.dump(rfr, 'rfr_best_<date>.pkl')
Model Validation in Python