Model generalization: bootstrapping and cross-validation

Practicing Machine Learning Interview Questions in Python

Lisa Stuart

Data Scientist

Chapter 4 overview

  • Bootstrapping/cross-validation --> model generalization
  • Imbalanced classes
  • Correlated features
  • Ensemble model selection
Practicing Machine Learning Interview Questions in Python

Model generalization

  • A ML model's ability to perform well on unseen data
    • test dataset
    • future data
  • Train metrics $\approx$ test metrics
  • Overfit models do not generalize
Practicing Machine Learning Interview Questions in Python

Decision tree

Decision tree plot

1 https://medium.com/@rnbrown/creating-and-visualizing-decision-trees-with-python-f8e8fa394176
Practicing Machine Learning Interview Questions in Python

Decision tree nodes

Decision tree root node

Practicing Machine Learning Interview Questions in Python

Advantages vs disadvantages

Decision tree plot

  • Advantages:
    • Easy to understand
    • Easy to visualize
  • Disadvantages:
    • Easily overfit
    • Considered greedy
    • Biased in cases of class imbalance
Practicing Machine Learning Interview Questions in Python

Random Forest

Random Forest

1 https://www.researchgate.net/figure/Random-Forest-visualization_fig11_326560291
Practicing Machine Learning Interview Questions in Python

K-fold cross-validation

K-fold cross-validation

1 https://scikit-learn.org/stable/modules/cross_validation.html
Practicing Machine Learning Interview Questions in Python

Functions

# decision tree
`sklearn.tree.DecisionTreeClassifier` 

# random forest 
`sklearn.ensemble.RandomForestClassifier`

# cross-validated grid search
`sklearn.model_selection.GridSearchCV` 

# model accuracy
`sklearn.metrics.accuracy_score` 

# train/test split function
`sklearn.model_selection.train_test_split`

# Parameters that gave best results
`cross-val_model.best_params_`

# Mean cross-validated score of 
# estimator with best params 
`cross-val_model.best_score_`
Practicing Machine Learning Interview Questions in Python

GridSearchCV vs RandomSearchCV

Grid search

Practicing Machine Learning Interview Questions in Python

Let's practice!

Practicing Machine Learning Interview Questions in Python

Preparing Video For Download...