Winning a Kaggle Competition in Python
Yauhen Babakhin
Kaggle Grandmaster
| ID | Categorical feature |
|---|---|
| 1 | A |
| 2 | B |
| 3 | C |
| 4 | A |
| 5 | D |
| 6 | A |
| ID | Label-encoded |
|---|---|
| 1 | 0 |
| 2 | 1 |
| 3 | 2 |
| 4 | 0 |
| 5 | 3 |
| 6 | 0 |
# Import LabelEncoder from sklearn.preprocessing import LabelEncoder# Create a LabelEncoder object le = LabelEncoder()# Encode a categorical feature df['cat_encoded'] = le.fit_transform(df['cat'])
ID cat cat_encoded
0 1 A 0
1 2 B 1
2 3 C 2
3 4 A 0
| ID | Categorical feature |
|---|---|
| 1 | A |
| 2 | B |
| 3 | C |
| 4 | A |
| 5 | D |
| 6 | A |
| ID | Cat == A | Cat == B | Cat == C | Cat == D |
|---|---|---|---|---|
| 1 | 1 | 0 | 0 | 0 |
| 2 | 0 | 1 | 0 | 0 |
| 3 | 0 | 0 | 1 | 0 |
| 4 | 1 | 0 | 0 | 0 |
| 5 | 0 | 0 | 0 | 1 |
| 6 | 1 | 0 | 0 | 0 |
# Create One-Hot encoded features ohe = pd.get_dummies(df['cat'], prefix='ohe_cat')# Drop the initial feature df.drop('cat', axis=1, inplace=True)# Concatenate OHE features to the dataframe df = pd.concat([df, ohe], axis=1)
ID ohe_cat_A ohe_cat_B ohe_cat_C ohe_cat_D
0 1 1 0 0 0
1 2 0 1 0 0
2 3 0 0 1 0
3 4 1 0 0 0
# DataFrame with a binary feature
binary_feature
binary_feat
0 Yes
1 No
le = LabelEncoder()
binary_feature['binary_encoded'] = le.fit_transform(binary_feature['binary_feat'])
binary_feat binary_encoded
0 Yes 1
1 No 0
Winning a Kaggle Competition in Python