Model Validation in Python
Kasey Jones
Data Scientist
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier
rfc = RandomForestClassifier()
estimator
: the model to use
X
: the predictor dataset
y
: the response array
cv
: the number of cross-validation splits
cross_val_score(estimator=rfc, X=X, y=y, cv=5)
The cross_val_score scoring
parameter:
# Load the Methods
from sklearn.metrics import mean_absolute_error, make_scorer
# Create a scorer
mae_scorer = make_scorer(mean_absolute_error)
# Use the scorer
cross_val_score(<estimator>, <X>, <y>, cv=5, scoring=mae_scorer)
Load all of the sklearn
methods
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import cross_val_score
from sklearn.metrics import mean_squared_error, make_scorer
Create a model and a scorer
rfc = RandomForestRegressor(n_estimators=20, max_depth=5, random_state=1111)
mse = make_scorer(mean_squared_error)
Run cross_val_score()
cv_results = cross_val_score(rfc, X, y, cv=5, scoring=mse)
print(cv_results)
[196.765, 108.563, 85.963, 222.594, 140.942]
Report the mean and standard deviation:
print('The mean: {}'.format(cv_results.mean()))
print('The std: {}'.format(cv_results.std()))
The mean: 150.965
The std: 51.676
Model Validation in Python