Model Validation in Python
Kasey Jones
Data Scientist
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2) rf = RandomForestRegressor() rf.fit(X_train, y_train)
out_of_sample = rf.predict(X_test) print(mae(y_test, out_of_sample))
10.24
cd = pd.read_csv("candy-data.csv")
s1 = cd.sample(60, random_state=1111)
s2 = cd.sample(60, random_state=1112)
Overlapping candies:
print(len([i for i in s1.index if i in s2.index]))
39
Chocolate Candies:
print(s1.chocolate.value_counts()[0])
print(s2.chocolate.value_counts()[0])
34
30
Sample 1 Testing Error
print('Testing error: {0:.2f}'.format(mae(s1_y_test, rfr.predict(s1_X_test))))
10.32
Sample 2 Testing Error
print('Testing error: {0:.2f}'.format(mae(s2_y_test, rfr.predict(s2_X_test))))
11.56
X_temp, X_val, y_temp, y_val = train_test_split(..., random_state=1111) X_train, X_test, y_train, y_test = train_test_split(..., random_state=1111) rfr = RandomForestRegressor(n_estimators=25, random_state=1111, max_features=4) rfr.fit(X_train, y_train)
print('Validation error: {0:.2f}'.format(mae(y_test, rfr.predict(X_test))))
9.18
print('Testing error: {0:.2f}'.format(mae(y_val, rfr.predict(X_val))))
8.98
X_temp, X_val, y_temp, y_val = train_test_split(..., random_state=1171) X_train, X_test, y_train, y_test = train_test_split(..., random_state=1171) rfr = RandomForestRegressor(n_estimators=25, random_state=1111, max_features=4) rfr.fit(X_train, y_train)
print('Validation error: {0:.2f}'.format(mae(y_test, rfr.predict(X_test))))
8.73
print('Testing error: {0:.2f}'.format(mae(y_val, rfr.predict(X_val))))
10.91
Model Validation in Python