Accuracy metrics: regression models

Validazione dei modelli in Python

Kasey Jones

Data Scientist

Regression models

Regression models classify continuous variables. Such as number of points, number of gallons, or number of puppies!

Validazione dei modelli in Python

Mean absolute error (MAE)

 

$$ MAE = \frac{\sum_{i=1}^{n} |y_i - \hat{y}_i|}{n} $$

  • Simplest and most intuitive metric
  • Treats all points equally
  • Not sensitive to outliers
Validazione dei modelli in Python

Mean squared error (MSE)

 

$$ MSE = \frac{\sum_{i=1}^{n} (y_i - \hat{y}_i) ^2}{n} $$

  • Most widely used regression metric
  • Allows outlier errors to contribute more to the overall error
  • Random family road trips could lead to large errors in predictions
Validazione dei modelli in Python

MAE vs. MSE

  • Accuracy metrics are always application specific
  • MAE and MSE error terms are in different units and should not be compared
Validazione dei modelli in Python

Mean absolute error

rfr = RandomForestRegressor(n_estimators=500, random_state=1111)
rfr.fit(X_train, y_train)
test_predictions = rfr.predict(X_test)

sum(abs(y_test - test_predictions))/len(test_predictions)
9.99
from sklearn.metrics import mean_absolute_error
mean_absolute_error(y_test, test_predictions)
9.99
Validazione dei modelli in Python

Mean squared error

sum(abs(y_test - test_predictions)**2)/len(test_predictions)
141.4
from sklearn.metrics import mean_squared_error
mean_squared_error(y_test, test_predictions)
141.4
Validazione dei modelli in Python

Accuracy for a subset of data

chocolate_preds = rfr.predict(X_test[X_test[:, 1] == 1])
mean_absolute_error(y_test[X_test[:, 1] == 1], chocolate_preds)
8.79
nonchocolate_preds = rfr.predict(X_test[X_test[:, 1] == 0])
mean_absolute_error(y_test[X_test[:, 1] == 0], nonchocolate_preds)
10.99
Validazione dei modelli in Python

Let's practice

Validazione dei modelli in Python

Preparing Video For Download...