Generalization Error

Machine Learning with Tree-Based Models in Python

Elie Kawerk

Data Scientist

Supervised Learning - Under the Hood

  • Supervised Learning: $y =f(x)$, $f$ is unknown.

noisy-ds

Machine Learning with Tree-Based Models in Python

Goals of Supervised Learning

  • Find a model $\hat{f}$ that best approximates $f$: $\hat{f} \approx f$

  • $\hat{f}$ can be Logistic Regression, Decision Tree, Neural Network ...

  • Discard noise as much as possible.

  • End goal: $\hat{f}$ should achieve a low predictive error on unseen datasets.

Machine Learning with Tree-Based Models in Python

Difficulties in Approximating $f$

  • Overfitting:

    $\hat{f}(x)$ fits the training set noise.

  • Underfitting:

    $\hat{f}$ is not flexible enough to approximate $f$.

Machine Learning with Tree-Based Models in Python

Overfitting

overfit

Machine Learning with Tree-Based Models in Python

Underfitting

underfit

Machine Learning with Tree-Based Models in Python

Generalization Error

  • Generalization Error of $\hat{f}$: Does $\hat{f}$ generalize well on unseen data?

  • It can be decomposed as follows:

    Generalization Error of $\hat{f} = bias^2 + variance + \text{irreducible error}$

Machine Learning with Tree-Based Models in Python

Bias

  • Bias: error term that tells you, on average, how much $\hat{f} \neq f$.

high-bias

Machine Learning with Tree-Based Models in Python

Variance

  • Variance: tells you how much $\hat{f}$ is inconsistent over different training sets.

high-variance

Machine Learning with Tree-Based Models in Python

Model Complexity

  • Model Complexity: sets the flexibility of $\hat{f}$.

  • Example: Maximum tree depth, Minimum samples per leaf, ...

Machine Learning with Tree-Based Models in Python

Bias-Variance Tradeoff

gener-decomposition

Machine Learning with Tree-Based Models in Python

Bias-Variance Tradeoff: A Visual Explanation

bias-variance-visual

Machine Learning with Tree-Based Models in Python

Let's practice!

Machine Learning with Tree-Based Models in Python

Preparing Video For Download...