sklearn's cross_val_score()

Model Validation in Python

Kasey Jones

Data Scientist

cross_val_score()

from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier
rfc = RandomForestClassifier()

estimator: the model to use

X: the predictor dataset

y: the response array

cv: the number of cross-validation splits

cross_val_score(estimator=rfc, X=X, y=y, cv=5)

Using scoring and make_scorer

The cross_val_score scoring parameter:

# Load the Methods
from sklearn.metrics import mean_absolute_error, make_scorer

# Create a scorer
mae_scorer = make_scorer(mean_absolute_error)

# Use the scorer
cross_val_score(<estimator>, <X>, <y>, cv=5, scoring=mae_scorer)

Load all of the sklearn methods

from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import cross_val_score
from sklearn.metrics import mean_squared_error, make_scorer

Create a model and a scorer

rfc = RandomForestRegressor(n_estimators=20, max_depth=5, random_state=1111)
mse = make_scorer(mean_squared_error)

Run cross_val_score()

cv_results = cross_val_score(rfc, X, y, cv=5, scoring=mse)

Accessing the results

print(cv_results)

[196.765, 108.563, 85.963, 222.594, 140.942]

Report the mean and standard deviation:

print('The mean: {}'.format(cv_results.mean()))
print('The std: {}'.format(cv_results.std()))

The mean: 150.965

The std: 51.676

Let's practice!

Model Validation in Python