Metrik akurasi: model regresi

Validasi Model di Python

Kasey Jones

Data Scientist

Model regresi

Model regresi memprediksi variabel kontinu, seperti jumlah poin, galon, atau anak anjing.

Validasi Model di Python

Mean absolute error (MAE)

 

$$ MAE = \frac{\sum_{i=1}^{n} |y_i - \hat{y}_i|}{n} $$

  • Metrik paling sederhana dan intuitif
  • Semua titik diperlakukan sama
  • Tidak sensitif terhadap pencilan
Validasi Model di Python

Mean squared error (MSE)

 

$$ MSE = \frac{\sum_{i=1}^{n} (y_i - \hat{y}_i) ^2}{n} $$

  • Metrik regresi yang paling umum
  • Membiarkan kesalahan pencilan berkontribusi lebih besar pada total error
  • Perjalanan keluarga acak dapat menimbulkan error prediksi besar
Validasi Model di Python

MAE vs. MSE

  • Metrik akurasi selalu spesifik aplikasi
  • Satuan MAE dan MSE berbeda; jangan dibandingkan langsung
Validasi Model di Python

Mean absolute error

rfr = RandomForestRegressor(n_estimators=500, random_state=1111)
rfr.fit(X_train, y_train)
test_predictions = rfr.predict(X_test)

sum(abs(y_test - test_predictions))/len(test_predictions)
9.99
from sklearn.metrics import mean_absolute_error
mean_absolute_error(y_test, test_predictions)
9.99
Validasi Model di Python

Mean squared error

sum(abs(y_test - test_predictions)**2)/len(test_predictions)
141.4
from sklearn.metrics import mean_squared_error
mean_squared_error(y_test, test_predictions)
141.4
Validasi Model di Python

Akurasi untuk subset data

chocolate_preds = rfr.predict(X_test[X_test[:, 1] == 1])
mean_absolute_error(y_test[X_test[:, 1] == 1], chocolate_preds)
8.79
nonchocolate_preds = rfr.predict(X_test[X_test[:, 1] == 0])
mean_absolute_error(y_test[X_test[:, 1] == 0], nonchocolate_preds)
10.99
Validasi Model di Python

Ayo berlatih!

Validasi Model di Python

Preparing Video For Download...