Winning a Kaggle Competition in Python
Yauhen Babakhin
Kaggle Grandmaster

| Test ID | Model A prediction | Model B prediction |
|---|---|---|
| 1 | 1.2 | 1.5 |
| 2 | 0.1 | 0.4 |
| 3 | 5.4 | 7.2 |
| Test ID | Model A prediction | Model B prediction | Arithmetic mean |
|---|---|---|---|
| 1 | 1.2 | 1.5 | 1.35 |
| 2 | 0.1 | 0.4 | 0.25 |
| 3 | 5.4 | 7.2 | 6.30 |
$$arithmetic = \frac{1}{n}\sum_{i=1}^{n}{x_i}$$
$$geometric = \Bigg({\prod_{i=1}^{n}{x_i}}\Bigg)^{\frac{1}{n}}$$
| Train ID | feature_1 | ... | feature_N | Target |
|---|---|---|---|---|
| 1 | 0.55 | ... | 1.37 | 1 |
| 2 | 0.12 | ... | -2.50 | 0 |
| 3 | 0.65 | ... | 3.14 | 0 |
| 4 | 0.10 | ... | 2.87 | 1 |
| 5 | 0.54 | ... | -0.10 | 0 |
| Test IDs | feature_1 | ... | feature_N | Target |
|---|---|---|---|---|
| 11 | 0.49 | ... | -2.32 | ? |
| 12 | 0.32 | ... | 1.15 | ? |
| 13 | 0.91 | ... | 0.81 | ? |
| Train ID | feature_1 | ... | feature_N | Target |
|---|---|---|---|---|
| 1 | 0.55 | ... | 1.37 | 1 |
| 2 | 0.12 | ... | -2.50 | 0 |
| 3 | 0.65 | ... | 3.14 | 0 |
| Train ID | feature_1 | ... | feature_N | Target |
|---|---|---|---|---|
| 4 | 0.10 | ... | 2.87 | 1 |
| 5 | 0.54 | ... | -0.10 | 0 |
| Train ID | feature_1 | ... | feature_N | Target |
|---|---|---|---|---|
| 1 | 0.55 | ... | 1.37 | 1 |
| 2 | 0.12 | ... | -2.50 | 0 |
| 3 | 0.65 | ... | 3.14 | 0 |
| Train ID | feature_1 | ... | feature_N | Target |
|---|---|---|---|---|
| 4 | 0.10 | ... | 2.87 | 1 |
| 5 | 0.54 | ... | -0.10 | 0 |
| Train ID | feature_1 | ... | feature_N | Target | A_pred | B_pred | C_pred |
|---|---|---|---|---|---|---|---|
| 4 | 0.10 | ... | 2.87 | 1 | 0.71 | 0.52 | 0.98 |
| 5 | 0.54 | ... | -0.10 | 0 | 0.45 | 0.32 | 0.24 |
| Test IDs | feature_1 | ... | feature_N | Target | A_pred | B_pred | C_pred |
|---|---|---|---|---|---|---|---|
| 11 | 0.49 | ... | -2.32 | ? | 0.62 | 0.45 | 0.81 |
| 12 | 0.32 | ... | 1.15 | ? | 0.31 | 0.52 | 0.41 |
| 13 | 0.91 | ... | 0.81 | ? | 0.74 | 0.55 | 0.92 |
| Train ID | Target | A_pred | B_pred | C_pred |
|---|---|---|---|---|
| 4 | 1 | 0.71 | 0.52 | 0.98 |
| 5 | 0 | 0.45 | 0.32 | 0.24 |
| Test IDs | Target | A_pred | B_pred | C_pred |
|---|---|---|---|---|
| 11 | ? | 0.62 | 0.45 | 0.81 |
| 12 | ? | 0.31 | 0.52 | 0.41 |
| 13 | ? | 0.74 | 0.55 | 0.92 |
| Train ID | Target | A_pred | B_pred | C_pred |
|---|---|---|---|---|
| 4 | 1 | 0.71 | 0.52 | 0.98 |
| 5 | 0 | 0.45 | 0.32 | 0.24 |
| Test IDs | Target | A_pred | B_pred | C_pred |
|---|---|---|---|---|
| 11 | ? | 0.62 | 0.45 | 0.81 |
| 12 | ? | 0.31 | 0.52 | 0.41 |
| 13 | ? | 0.74 | 0.55 | 0.92 |
| Train ID | Target | A_pred | B_pred | C_pred |
|---|---|---|---|---|
| 4 | 1 | 0.71 | 0.52 | 0.98 |
| 5 | 0 | 0.45 | 0.32 | 0.24 |
| Test IDs | Target | A_pred | B_pred | C_pred | Stacking prediction |
|---|---|---|---|---|---|
| 11 | ? | 0.62 | 0.45 | 0.81 | 0.73 |
| 12 | ? | 0.31 | 0.52 | 0.41 | 0.35 |
| 13 | ? | 0.74 | 0.55 | 0.92 | 0.88 |
Winning a Kaggle Competition in Python