The bias-variance tradeoff

Validazione dei modelli in Python

Kasey Jones

Data Scientist

Variance

  • Variance: following the training data too closely
    • Fails to generalize to the test data
    • Low training error but high testing error
    • Occurs when models are overfit and have high complexity
Validazione dei modelli in Python

Overfitting models (high variance)

Overfitting occurs when our predictions follow the training data too closely. If we drew a scatter plot, and all our predictions were exactly in-line with the real values, we are probably overfit.

Validazione dei modelli in Python

Bias

  • Bias: failing to find the relationship between the data and the response
    • High training/testing error
    • Occurs when models are underfit
Validazione dei modelli in Python

Underfitting models (high bias)

Underfitting occurs when there is a relationship between the variable we are predicting and the predictive variables in the model, but we failed to find this relationship.

Validazione dei modelli in Python

Optimal performance

Validazione dei modelli in Python

Parameters causing over/under fitting

rfc = RandomForestClassifier(n_estimators=100, max_depth=4)
rfc.fit(X_train, y_train)

print("Training: {0:.2f}".format(accuracy_score(y_train, train_predictions)))
Training: .84
print("Testing: {0:.2f}".format(accuracy_score(y_test, test_predictions)))
Testing: .77
Validazione dei modelli in Python
rfc = RandomForestClassifier(n_estimators=100, max_depth=14)
rfc.fit(X_train, y_train)

print("Training: {0:.2f}".format(accuracy_score(y_train, train_predictions)))
Training: 1.0
print("Testing: {0:.2f}".format(accuracy_score(y_test, test_predictions)))
Testing: .83
Validazione dei modelli in Python
rfc = RandomForestClassifier(n_estimators=100, max_depth=10)
rfc.fit(X_train, y_train)

print("Training: {0:.2f}".format(accuracy_score(y_train, train_predictions)))
Training: .89
print("Testing: {0:.2f}".format(accuracy_score(y_test, test_predictions)))
Testing: .86
Validazione dei modelli in Python

Remember, only you can prevent overfitting!

Validazione dei modelli in Python

Preparing Video For Download...