Masalah pada holdout set

Validasi Model di Python

Kasey Jones

Data Scientist

Transisi ke validasi

Pemecahan tradisional train-test menggunakan sebagian besar data untuk pelatihan dan porsi kecil hanya untuk pengujian.

X_train, X_val, y_train, y_val =
    train_test_split(X, y,
    test_size=0.2)

rf = RandomForestRegressor()

rf.fit(X_train, y_train)

out_of_sample = rf.predict(X_test) print(mae(y_test, out_of_sample))
10.24
Validasi Model di Python

Pembagian train tradisional

cd = pd.read_csv("candy-data.csv")
s1 = cd.sample(60, random_state=1111)
s2 = cd.sample(60, random_state=1112)

Permen yang tumpang tindih:

print(len([i for i in s1.index if i in s2.index]))
39
Validasi Model di Python

Pembagian train tradisional

Permen Cokelat:

print(s1.chocolate.value_counts()[0])
print(s2.chocolate.value_counts()[0])
34
30
Validasi Model di Python

Pembagian itu berpengaruh

Galat uji Sampel 1

print('Testing error: {0:.2f}'.format(mae(s1_y_test, rfr.predict(s1_X_test))))
10.32

Galat uji Sampel 2

print('Testing error: {0:.2f}'.format(mae(s2_y_test, rfr.predict(s2_X_test))))
11.56
Validasi Model di Python

Train, validation, test

X_temp, X_val, y_temp, y_val = train_test_split(..., random_state=1111)
X_train, X_test, y_train, y_test = train_test_split(..., random_state=1111)

rfr = RandomForestRegressor(n_estimators=25, random_state=1111, max_features=4)
rfr.fit(X_train, y_train)

print('Validation error: {0:.2f}'.format(mae(y_test, rfr.predict(X_test))))
9.18
print('Testing error: {0:.2f}'.format(mae(y_val, rfr.predict(X_val))))
8.98
Validasi Model di Python

Putaran 2

X_temp, X_val, y_temp, y_val = train_test_split(..., random_state=1171)
X_train, X_test, y_train, y_test = train_test_split(..., random_state=1171)

rfr = RandomForestRegressor(n_estimators=25, random_state=1111, max_features=4)
rfr.fit(X_train, y_train)

print('Validation error: {0:.2f}'.format(mae(y_test, rfr.predict(X_test))))
8.73
print('Testing error: {0:.2f}'.format(mae(y_val, rfr.predict(X_val))))
10.91
Validasi Model di Python

Latihan holdout set

Validasi Model di Python

Preparing Video For Download...