Winning a Kaggle Competition in Python
Yauhen Babakhin
Kaggle Grandmaster
Train ID | Categorical | Target |
---|---|---|
1 | A | 1 |
2 | B | 0 |
3 | B | 0 |
4 | A | 1 |
5 | B | 0 |
6 | A | 0 |
7 | B | 1 |
Test ID | Categorical | Target |
---|---|---|
10 | A | ? |
11 | A | ? |
12 | B | ? |
13 | A | ? |
Train ID | Categorical | Target |
---|---|---|
1 | A | 1 |
2 | B | 0 |
3 | B | 0 |
4 | A | 1 |
5 | B | 0 |
6 | A | 0 |
7 | B | 1 |
Train ID | Categorical | Target |
---|---|---|
1 | A | 1 |
2 | B | 0 |
3 | B | 0 |
4 | A | 1 |
5 | B | 0 |
6 | A | 0 |
7 | B | 1 |
Train ID | Categorical | Target |
---|---|---|
1 | A | 1 |
2 | B | 0 |
3 | B | 0 |
4 | A | 1 |
5 | B | 0 |
6 | A | 0 |
7 | B | 1 |
Test ID | Categorical | Target | Mean encoded |
---|---|---|---|
10 | A | ? | 0.66 |
11 | A | ? | 0.66 |
12 | B | ? | 0.25 |
13 | A | ? | 0.66 |
Train ID | Categorical | Target | Fold |
---|---|---|---|
1 | A | 1 | 1 |
2 | B | 0 | 1 |
3 | B | 0 | 1 |
4 | A | 1 | 1 |
5 | B | 0 | 2 |
6 | A | 0 | 2 |
7 | B | 1 | 2 |
Train ID | Categorical | Target | Fold | Mean encoded |
---|---|---|---|---|
1 | A | 1 | 1 | |
2 | B | 0 | 1 | |
3 | B | 0 | 1 | |
4 | A | 1 | 1 | |
5 | B | 0 | 2 | |
6 | A | 0 | 2 | |
7 | B | 1 | 2 |
Train ID | Categorical | Target | Fold | Mean encoded |
---|---|---|---|---|
1 | A | 1 | 1 | 0 |
2 | B | 0 | 1 | 0.5 |
3 | B | 0 | 1 | 0.5 |
4 | A | 1 | 1 | 0 |
5 | B | 0 | 2 | |
6 | A | 0 | 2 | |
7 | B | 1 | 2 |
Train ID | Categorical | Target | Fold | Mean encoded |
---|---|---|---|---|
1 | A | 1 | 1 | 0 |
2 | B | 0 | 1 | 0.5 |
3 | B | 0 | 1 | 0.5 |
4 | A | 1 | 1 | 0 |
5 | B | 0 | 2 | |
6 | A | 0 | 2 | |
7 | B | 1 | 2 |
Train ID | Categorical | Target | Fold | Mean encoded |
---|---|---|---|---|
1 | A | 1 | 1 | 0 |
2 | B | 0 | 1 | 0.5 |
3 | B | 0 | 1 | 0.5 |
4 | A | 1 | 1 | 0 |
5 | B | 0 | 2 | 0 |
6 | A | 0 | 2 | 1 |
7 | B | 1 | 2 | 0 |
$$mean\_enc_i = \frac{target\_sum_i}{n_i}$$
$$smoothed\_mean\_enc_i = \frac{target\_sum_i + \alpha*global\_mean}{n_i + \alpha}$$
$$\alpha \in [5; 10]$$
$$mean\_enc_i = \frac{target\_sum_i}{n_i}$$
$$smoothed\_mean\_enc_i = \frac{target\_sum_i + \alpha*global\_mean}{n_i + \alpha}$$
$$\alpha \in [5; 10]$$
Train ID | Categorical | Target |
---|---|---|
1 | A | 1 |
2 | B | 0 |
3 | B | 0 |
4 | A | 0 |
5 | B | 1 |
Test ID | Categorical | Target | Mean encoded |
---|---|---|---|
10 | A | ? | 0.43 |
11 | B | ? | 0.38 |
12 | C | ? | 0.40 |
Winning a Kaggle Competition in Python