Accuracy metrics: regression models

Model Validation in Python

Kasey Jones

Data Scientist

Regression models

Regression models classify continuous variables. Such as number of points, number of gallons, or number of puppies!

Mean absolute error (MAE)

$$ MAE = \frac{\sum_{i=1}^{n} |y_i - \hat{y}_i|}{n} $$

Simplest and most intuitive metric
Treats all points equally
Not sensitive to outliers

Mean squared error (MSE)

$$ MSE = \frac{\sum_{i=1}^{n} (y_i - \hat{y}_i) ^2}{n} $$

Most widely used regression metric
Allows outlier errors to contribute more to the overall error
Random family road trips could lead to large errors in predictions

MAE vs. MSE

Accuracy metrics are always application specific
MAE and MSE error terms are in different units and should not be compared

Mean absolute error

rfr = RandomForestRegressor(n_estimators=500, random_state=1111)
rfr.fit(X_train, y_train)
test_predictions = rfr.predict(X_test)

sum(abs(y_test - test_predictions))/len(test_predictions)

9.99

from sklearn.metrics import mean_absolute_error
mean_absolute_error(y_test, test_predictions)

9.99

Mean squared error

sum(abs(y_test - test_predictions)**2)/len(test_predictions)

141.4

from sklearn.metrics import mean_squared_error
mean_squared_error(y_test, test_predictions)

141.4

Accuracy for a subset of data

chocolate_preds = rfr.predict(X_test[X_test[:, 1] == 1])
mean_absolute_error(y_test[X_test[:, 1] == 1], chocolate_preds)

8.79

nonchocolate_preds = rfr.predict(X_test[X_test[:, 1] == 0])
mean_absolute_error(y_test[X_test[:, 1] == 0], nonchocolate_preds)

10.99

Let's practice

Model Validation in Python