Accuracy metrics: regression models

Model Validation in Python

Kasey Jones

Data Scientist

Regression models

Regression models classify continuous variables. Such as number of points, number of gallons, or number of puppies!

Model Validation in Python

Mean absolute error (MAE)

 

$$ MAE = \frac{\sum_{i=1}^{n} |y_i - \hat{y}_i|}{n} $$

  • Simplest and most intuitive metric
  • Treats all points equally
  • Not sensitive to outliers
Model Validation in Python

Mean squared error (MSE)

 

$$ MSE = \frac{\sum_{i=1}^{n} (y_i - \hat{y}_i) ^2}{n} $$

  • Most widely used regression metric
  • Allows outlier errors to contribute more to the overall error
  • Random family road trips could lead to large errors in predictions
Model Validation in Python

MAE vs. MSE

  • Accuracy metrics are always application specific
  • MAE and MSE error terms are in different units and should not be compared
Model Validation in Python

Mean absolute error

rfr = RandomForestRegressor(n_estimators=500, random_state=1111)
rfr.fit(X_train, y_train)
test_predictions = rfr.predict(X_test)

sum(abs(y_test - test_predictions))/len(test_predictions)
9.99
from sklearn.metrics import mean_absolute_error
mean_absolute_error(y_test, test_predictions)
9.99
Model Validation in Python

Mean squared error

sum(abs(y_test - test_predictions)**2)/len(test_predictions)
141.4
from sklearn.metrics import mean_squared_error
mean_squared_error(y_test, test_predictions)
141.4
Model Validation in Python

Accuracy for a subset of data

chocolate_preds = rfr.predict(X_test[X_test[:, 1] == 1])
mean_absolute_error(y_test[X_test[:, 1] == 1], chocolate_preds)
8.79
nonchocolate_preds = rfr.predict(X_test[X_test[:, 1] == 0])
mean_absolute_error(y_test[X_test[:, 1] == 0], nonchocolate_preds)
10.99
Model Validation in Python

Let's practice

Model Validation in Python

Preparing Video For Download...