Winning a Kaggle Competition in Python
Yauhen Babakhin
Kaggle Grandmaster
Test ID | Model A prediction | Model B prediction |
---|---|---|
1 | 1.2 | 1.5 |
2 | 0.1 | 0.4 |
3 | 5.4 | 7.2 |
Test ID | Model A prediction | Model B prediction | Arithmetic mean |
---|---|---|---|
1 | 1.2 | 1.5 | 1.35 |
2 | 0.1 | 0.4 | 0.25 |
3 | 5.4 | 7.2 | 6.30 |
$$arithmetic = \frac{1}{n}\sum_{i=1}^{n}{x_i}$$
$$geometric = \Bigg({\prod_{i=1}^{n}{x_i}}\Bigg)^{\frac{1}{n}}$$
Train ID | feature_1 | ... | feature_N | Target |
---|---|---|---|---|
1 | 0.55 | ... | 1.37 | 1 |
2 | 0.12 | ... | -2.50 | 0 |
3 | 0.65 | ... | 3.14 | 0 |
4 | 0.10 | ... | 2.87 | 1 |
5 | 0.54 | ... | -0.10 | 0 |
Test IDs | feature_1 | ... | feature_N | Target |
---|---|---|---|---|
11 | 0.49 | ... | -2.32 | ? |
12 | 0.32 | ... | 1.15 | ? |
13 | 0.91 | ... | 0.81 | ? |
Train ID | feature_1 | ... | feature_N | Target |
---|---|---|---|---|
1 | 0.55 | ... | 1.37 | 1 |
2 | 0.12 | ... | -2.50 | 0 |
3 | 0.65 | ... | 3.14 | 0 |
Train ID | feature_1 | ... | feature_N | Target |
---|---|---|---|---|
4 | 0.10 | ... | 2.87 | 1 |
5 | 0.54 | ... | -0.10 | 0 |
Train ID | feature_1 | ... | feature_N | Target |
---|---|---|---|---|
1 | 0.55 | ... | 1.37 | 1 |
2 | 0.12 | ... | -2.50 | 0 |
3 | 0.65 | ... | 3.14 | 0 |
Train ID | feature_1 | ... | feature_N | Target |
---|---|---|---|---|
4 | 0.10 | ... | 2.87 | 1 |
5 | 0.54 | ... | -0.10 | 0 |
Train ID | feature_1 | ... | feature_N | Target | A_pred | B_pred | C_pred |
---|---|---|---|---|---|---|---|
4 | 0.10 | ... | 2.87 | 1 | 0.71 | 0.52 | 0.98 |
5 | 0.54 | ... | -0.10 | 0 | 0.45 | 0.32 | 0.24 |
Test IDs | feature_1 | ... | feature_N | Target | A_pred | B_pred | C_pred |
---|---|---|---|---|---|---|---|
11 | 0.49 | ... | -2.32 | ? | 0.62 | 0.45 | 0.81 |
12 | 0.32 | ... | 1.15 | ? | 0.31 | 0.52 | 0.41 |
13 | 0.91 | ... | 0.81 | ? | 0.74 | 0.55 | 0.92 |
Train ID | Target | A_pred | B_pred | C_pred |
---|---|---|---|---|
4 | 1 | 0.71 | 0.52 | 0.98 |
5 | 0 | 0.45 | 0.32 | 0.24 |
Test IDs | Target | A_pred | B_pred | C_pred |
---|---|---|---|---|
11 | ? | 0.62 | 0.45 | 0.81 |
12 | ? | 0.31 | 0.52 | 0.41 |
13 | ? | 0.74 | 0.55 | 0.92 |
Train ID | Target | A_pred | B_pred | C_pred |
---|---|---|---|---|
4 | 1 | 0.71 | 0.52 | 0.98 |
5 | 0 | 0.45 | 0.32 | 0.24 |
Test IDs | Target | A_pred | B_pred | C_pred |
---|---|---|---|---|
11 | ? | 0.62 | 0.45 | 0.81 |
12 | ? | 0.31 | 0.52 | 0.41 |
13 | ? | 0.74 | 0.55 | 0.92 |
Train ID | Target | A_pred | B_pred | C_pred |
---|---|---|---|---|
4 | 1 | 0.71 | 0.52 | 0.98 |
5 | 0 | 0.45 | 0.32 | 0.24 |
Test IDs | Target | A_pred | B_pred | C_pred | Stacking prediction |
---|---|---|---|---|---|
11 | ? | 0.62 | 0.45 | 0.81 | 0.73 |
12 | ? | 0.31 | 0.52 | 0.41 | 0.35 |
13 | ? | 0.74 | 0.55 | 0.92 | 0.88 |
Winning a Kaggle Competition in Python