Vincere una competizione Kaggle con Python
Yauhen Babakhin
Kaggle Grandmaster
| ID train | Categoriale | Target |
|---|---|---|
| 1 | A | 1 |
| 2 | B | 0 |
| 3 | B | 0 |
| 4 | A | 1 |
| 5 | B | 0 |
| 6 | A | 0 |
| 7 | B | 1 |
| ID test | Categoriale | Target |
|---|---|---|
| 10 | A | ? |
| 11 | A | ? |
| 12 | B | ? |
| 13 | A | ? |
| ID train | Categoriale | Target |
|---|---|---|
| 1 | A | 1 |
| 2 | B | 0 |
| 3 | B | 0 |
| 4 | A | 1 |
| 5 | B | 0 |
| 6 | A | 0 |
| 7 | B | 1 |
| ID train | Categoriale | Target |
|---|---|---|
| 1 | A | 1 |
| 2 | B | 0 |
| 3 | B | 0 |
| 4 | A | 1 |
| 5 | B | 0 |
| 6 | A | 0 |
| 7 | B | 1 |
| ID train | Categoriale | Target |
|---|---|---|
| 1 | A | 1 |
| 2 | B | 0 |
| 3 | B | 0 |
| 4 | A | 1 |
| 5 | B | 0 |
| 6 | A | 0 |
| 7 | B | 1 |
| ID test | Categoriale | Target | Mean encoded |
|---|---|---|---|
| 10 | A | ? | 0,66 |
| 11 | A | ? | 0,66 |
| 12 | B | ? | 0,25 |
| 13 | A | ? | 0,66 |
| ID train | Categoriale | Target | Fold |
|---|---|---|---|
| 1 | A | 1 | 1 |
| 2 | B | 0 | 1 |
| 3 | B | 0 | 1 |
| 4 | A | 1 | 1 |
| 5 | B | 0 | 2 |
| 6 | A | 0 | 2 |
| 7 | B | 1 | 2 |
| ID train | Categoriale | Target | Fold | Mean encoded |
|---|---|---|---|---|
| 1 | A | 1 | 1 | |
| 2 | B | 0 | 1 | |
| 3 | B | 0 | 1 | |
| 4 | A | 1 | 1 | |
| 5 | B | 0 | 2 | |
| 6 | A | 0 | 2 | |
| 7 | B | 1 | 2 |
| ID train | Categoriale | Target | Fold | Mean encoded |
|---|---|---|---|---|
| 1 | A | 1 | 1 | 0 |
| 2 | B | 0 | 1 | 0,5 |
| 3 | B | 0 | 1 | 0,5 |
| 4 | A | 1 | 1 | 0 |
| 5 | B | 0 | 2 | |
| 6 | A | 0 | 2 | |
| 7 | B | 1 | 2 |
| ID train | Categoriale | Target | Fold | Mean encoded |
|---|---|---|---|---|
| 1 | A | 1 | 1 | 0 |
| 2 | B | 0 | 1 | 0,5 |
| 3 | B | 0 | 1 | 0,5 |
| 4 | A | 1 | 1 | 0 |
| 5 | B | 0 | 2 | |
| 6 | A | 0 | 2 | |
| 7 | B | 1 | 2 |
| ID train | Categoriale | Target | Fold | Mean encoded |
|---|---|---|---|---|
| 1 | A | 1 | 1 | 0 |
| 2 | B | 0 | 1 | 0,5 |
| 3 | B | 0 | 1 | 0,5 |
| 4 | A | 1 | 1 | 0 |
| 5 | B | 0 | 2 | 0 |
| 6 | A | 0 | 2 | 1 |
| 7 | B | 1 | 2 | 0 |
$$mean\_enc_i = \frac{target\_sum_i}{n_i}$$
$$smoothed\_mean\_enc_i = \frac{target\_sum_i + \alpha*global\_mean}{n_i + \alpha}$$
$$\alpha \in [5; 10]$$
$$mean\_enc_i = \frac{target\_sum_i}{n_i}$$
$$smoothed\_mean\_enc_i = \frac{target\_sum_i + \alpha*global\_mean}{n_i + \alpha}$$
$$\alpha \in [5; 10]$$
| ID train | Categoriale | Target |
|---|---|---|
| 1 | A | 1 |
| 2 | B | 0 |
| 3 | B | 0 |
| 4 | A | 0 |
| 5 | B | 1 |
| ID test | Categoriale | Target | Mean encoded |
|---|---|---|---|
| 10 | A | ? | 0,43 |
| 11 | B | ? | 0,38 |
| 12 | C | ? | 0,40 |
Vincere una competizione Kaggle con Python