Model ensembling

Winning a Kaggle Competition in Python

Yauhen Babakhin

Kaggle Grandmaster

Model ensembling

ensemble design of a winning solution in the Kaggle competition

Winning a Kaggle Competition in Python

Model blending

  • Regression problem
  • Train two different models: A and B
  • Make predictions on the test data:
Test ID Model A prediction Model B prediction
1 1.2 1.5
2 0.1 0.4
3 5.4 7.2
Winning a Kaggle Competition in Python

Model blending

 

Test ID Model A prediction Model B prediction Arithmetic mean
1 1.2 1.5 1.35
2 0.1 0.4 0.25
3 5.4 7.2 6.30
Winning a Kaggle Competition in Python

Model blending

 

Arithmetic mean

$$arithmetic = \frac{1}{n}\sum_{i=1}^{n}{x_i}$$

Geometric mean

$$geometric = \Bigg({\prod_{i=1}^{n}{x_i}}\Bigg)^{\frac{1}{n}}$$

Winning a Kaggle Competition in Python

Model stacking

 

  1. Split train data into two parts
  2. Train multiple models on Part 1
  3. Make predictions on Part 2
  4. Make predictions on the test data
  5. Train a new model on Part 2 using predictions as features
  6. Make predictions on the test data using the 2nd level model
Winning a Kaggle Competition in Python

Stacking example

Train ID feature_1 ... feature_N Target
1 0.55 ... 1.37 1
2 0.12 ... -2.50 0
3 0.65 ... 3.14 0
4 0.10 ... 2.87 1
5 0.54 ... -0.10 0
Test IDs feature_1 ... feature_N Target
11 0.49 ... -2.32 ?
12 0.32 ... 1.15 ?
13 0.91 ... 0.81 ?
Winning a Kaggle Competition in Python

Stacking example

Train ID feature_1 ... feature_N Target
1 0.55 ... 1.37 1
2 0.12 ... -2.50 0
3 0.65 ... 3.14 0

 

Train ID feature_1 ... feature_N Target
4 0.10 ... 2.87 1
5 0.54 ... -0.10 0
Winning a Kaggle Competition in Python

Stacking example

Train ID feature_1 ... feature_N Target
1 0.55 ... 1.37 1
2 0.12 ... -2.50 0
3 0.65 ... 3.14 0

 

Train ID feature_1 ... feature_N Target
4 0.10 ... 2.87 1
5 0.54 ... -0.10 0

Train models A, B, C on Part 1

Winning a Kaggle Competition in Python

Stacking example

Train ID feature_1 ... feature_N Target A_pred B_pred C_pred
4 0.10 ... 2.87 1 0.71 0.52 0.98
5 0.54 ... -0.10 0 0.45 0.32 0.24

 

Test IDs feature_1 ... feature_N Target A_pred B_pred C_pred
11 0.49 ... -2.32 ? 0.62 0.45 0.81
12 0.32 ... 1.15 ? 0.31 0.52 0.41
13 0.91 ... 0.81 ? 0.74 0.55 0.92
Winning a Kaggle Competition in Python

Stacking example

Train ID Target A_pred B_pred C_pred
4 1 0.71 0.52 0.98
5 0 0.45 0.32 0.24

 

Test IDs Target A_pred B_pred C_pred
11 ? 0.62 0.45 0.81
12 ? 0.31 0.52 0.41
13 ? 0.74 0.55 0.92
Winning a Kaggle Competition in Python

Stacking example

Train ID Target A_pred B_pred C_pred
4 1 0.71 0.52 0.98
5 0 0.45 0.32 0.24

 

Test IDs Target A_pred B_pred C_pred
11 ? 0.62 0.45 0.81
12 ? 0.31 0.52 0.41
13 ? 0.74 0.55 0.92

Train 2nd level model on Part 2

Winning a Kaggle Competition in Python

Stacking example

Train ID Target A_pred B_pred C_pred
4 1 0.71 0.52 0.98
5 0 0.45 0.32 0.24

 

Test IDs Target A_pred B_pred C_pred Stacking prediction
11 ? 0.62 0.45 0.81 0.73
12 ? 0.31 0.52 0.41 0.35
13 ? 0.74 0.55 0.92 0.88
Winning a Kaggle Competition in Python

Let's practice!

Winning a Kaggle Competition in Python

Preparing Video For Download...