Machine Learning with Tree-Based Models in Python
Elie Kawerk
Data Scientist
Find a model $\hat{f}$ that best approximates $f$: $\hat{f} \approx f$
$\hat{f}$ can be Logistic Regression, Decision Tree, Neural Network ...
Discard noise as much as possible.
End goal: $\hat{f}$ should achieve a low predictive error on unseen datasets.
Overfitting:
$\hat{f}(x)$ fits the training set noise.
Underfitting:
$\hat{f}$ is not flexible enough to approximate $f$.
Generalization Error of $\hat{f}$: Does $\hat{f}$ generalize well on unseen data?
It can be decomposed as follows:
Generalization Error of $\hat{f} = bias^2 + variance + \text{irreducible error}$
Model Complexity: sets the flexibility of $\hat{f}$.
Example: Maximum tree depth, Minimum samples per leaf, ...
Machine Learning with Tree-Based Models in Python