Ensembling it all together

Ensemble Methods in Python

Román de las Heras

Data Scientist, Appodeal

Chapter 1: Voting and Averaging

Voting

  • Combination: mode (majority)
  • Classification
  • Heterogeneous ensemble method

Averaging

  • Combination: mean (average)
  • Classification and Regression
  • Heterogeneous ensemble method

Good choices when you:

  • Have built multiple different models
  • Are not sure which is the best
  • Want to improve the overall performance
Ensemble Methods in Python

Chapter 2: Bagging

Weak estimator

  • Performs just better than random guessing
  • Light model and fast model
  • Base for homogeneous ensemble methods

Bagging (Bootstrap Aggregating)

  • Random subsamples with replacement
  • Large amount of "weak" estimators
  • Aggregated by Voting or Averaging
  • Homogeneous ensemble method

Good choice when you:

  • Want to reduce variance
  • Need to avoid overfitting
  • Need more stability and robustness

* Observation:

  • Bagging is computationally expensive
Ensemble Methods in Python

Chapter 3: Boosting

Gradual learning

  • Homogeneous ensemble method type
  • Based on iterative learning
  • Sequential model building

Boosting algorithms

  • AdaBoost
  • Gradient Boosting:
    • XGBoost
    • LightGBM
    • CatBoost

Good choice when you:

  • Have complex problems
  • Need to apply parallel processing or distributed computing
  • Have big datasets or high-dimensional categorical features
Ensemble Methods in Python

Chapter 4: Stacking

Stacking

  • Combination: meta-estimator (model)
  • Classification and Regression
  • Heterogeneous ensemble method

Implementation

  • From scratch using pandas and sklearn
  • Using the existing MLxtend library

Good choice when you:

  • Have tried Voting / Averaging but results are not as expected
  • Have built models which perform well in different cases
Ensemble Methods in Python

Thank you and well ensembled!

Ensemble Methods in Python

Preparing Video For Download...