Bootstrap aggregating

Ensemble Methods in Python

Román de las Heras

Data Scientist, Appodeal

Heterogeneous vs Homogeneous Ensembles

Heterogeneous:

  • Different algorithms (fine-tuned)
  • Small amount of estimators
  • Voting, Averaging, and Stacking

Homogeneous:

  • The same algorithm ("weak" model)
  • Large amount of estimators
  • Bagging and Boosting
Ensemble Methods in Python

Condorcet's Jury Theorem

Requirements:

  • Models are independent
  • Each model performs better than random guessing
  • All individual models have similar performance

Conclusion: Adding more models improves the performance of the ensemble (Voting or Averaging), and this approaches 1 (100%)

Condorcet Marquis de Condorcet, French philosopher and mathematician

Ensemble Methods in Python

Bootstrapping

Bootstrapping requires:

  • Random subsamples
  • Using replacement

Bootstrapping guarantees:

  • Diverse crowd: different datasets
  • Independent: separately sampled

Bagging.png

Ensemble Methods in Python

Pros and cons of bagging

Pros

  • Bagging usually reduces variance
  • Overfitting can be avoided by the ensemble itself
  • More stability and robustness

Cons

  • It is computationally expensive
Ensemble Methods in Python

It's time to practice!

Ensemble Methods in Python

Preparing Video For Download...