Een Kaggle-competitie winnen met Python
Yauhen Babakhin
Kaggle Grandmaster
| Train-ID | Categorisch | Target |
|---|---|---|
| 1 | A | 1 |
| 2 | B | 0 |
| 3 | B | 0 |
| 4 | A | 1 |
| 5 | B | 0 |
| 6 | A | 0 |
| 7 | B | 1 |
| Test-ID | Categorisch | Target |
|---|---|---|
| 10 | A | ? |
| 11 | A | ? |
| 12 | B | ? |
| 13 | A | ? |
| Train-ID | Categorisch | Target |
|---|---|---|
| 1 | A | 1 |
| 2 | B | 0 |
| 3 | B | 0 |
| 4 | A | 1 |
| 5 | B | 0 |
| 6 | A | 0 |
| 7 | B | 1 |
| Train-ID | Categorisch | Target |
|---|---|---|
| 1 | A | 1 |
| 2 | B | 0 |
| 3 | B | 0 |
| 4 | A | 1 |
| 5 | B | 0 |
| 6 | A | 0 |
| 7 | B | 1 |
| Train-ID | Categorisch | Target |
|---|---|---|
| 1 | A | 1 |
| 2 | B | 0 |
| 3 | B | 0 |
| 4 | A | 1 |
| 5 | B | 0 |
| 6 | A | 0 |
| 7 | B | 1 |
| Test-ID | Categorisch | Target | Gemiddelde encoding |
|---|---|---|---|
| 10 | A | ? | 0,66 |
| 11 | A | ? | 0,66 |
| 12 | B | ? | 0,25 |
| 13 | A | ? | 0,66 |
| Train-ID | Categorisch | Target | Fold |
|---|---|---|---|
| 1 | A | 1 | 1 |
| 2 | B | 0 | 1 |
| 3 | B | 0 | 1 |
| 4 | A | 1 | 1 |
| 5 | B | 0 | 2 |
| 6 | A | 0 | 2 |
| 7 | B | 1 | 2 |
| Train-ID | Categorisch | Target | Fold | Gemiddelde encoding |
|---|---|---|---|---|
| 1 | A | 1 | 1 | |
| 2 | B | 0 | 1 | |
| 3 | B | 0 | 1 | |
| 4 | A | 1 | 1 | |
| 5 | B | 0 | 2 | |
| 6 | A | 0 | 2 | |
| 7 | B | 1 | 2 |
| Train-ID | Categorisch | Target | Fold | Gemiddelde encoding |
|---|---|---|---|---|
| 1 | A | 1 | 1 | 0 |
| 2 | B | 0 | 1 | 0,5 |
| 3 | B | 0 | 1 | 0,5 |
| 4 | A | 1 | 1 | 0 |
| 5 | B | 0 | 2 | |
| 6 | A | 0 | 2 | |
| 7 | B | 1 | 2 |
| Train-ID | Categorisch | Target | Fold | Gemiddelde encoding |
|---|---|---|---|---|
| 1 | A | 1 | 1 | 0 |
| 2 | B | 0 | 1 | 0,5 |
| 3 | B | 0 | 1 | 0,5 |
| 4 | A | 1 | 1 | 0 |
| 5 | B | 0 | 2 | |
| 6 | A | 0 | 2 | |
| 7 | B | 1 | 2 |
| Train-ID | Categorisch | Target | Fold | Gemiddelde encoding |
|---|---|---|---|---|
| 1 | A | 1 | 1 | 0 |
| 2 | B | 0 | 1 | 0,5 |
| 3 | B | 0 | 1 | 0,5 |
| 4 | A | 1 | 1 | 0 |
| 5 | B | 0 | 2 | 0 |
| 6 | A | 0 | 2 | 1 |
| 7 | B | 1 | 2 | 0 |
$$mean\_enc_i = \frac{target\_sum_i}{n_i}$$
$$smoothed\_mean\_enc_i = \frac{target\_sum_i + \alpha*global\_mean}{n_i + \alpha}$$
$$\alpha \in [5; 10]$$
$$mean\_enc_i = \frac{target\_sum_i}{n_i}$$
$$smoothed\_mean\_enc_i = \frac{target\_sum_i + \alpha*global\_mean}{n_i + \alpha}$$
$$\alpha \in [5; 10]$$
| Train-ID | Categorisch | Target |
|---|---|---|
| 1 | A | 1 |
| 2 | B | 0 |
| 3 | B | 0 |
| 4 | A | 0 |
| 5 | B | 1 |
| Test-ID | Categorisch | Target | Gemiddelde encoding |
|---|---|---|---|
| 10 | A | ? | 0,43 |
| 11 | B | ? | 0,38 |
| 12 | C | ? | 0,40 |
Een Kaggle-competitie winnen met Python