Winning a Kaggle Competition in Python
Yauhen Babakhin
Kaggle Grandmaster
ID | Categorical feature |
---|---|
1 | A |
2 | B |
3 | C |
4 | A |
5 | D |
6 | A |
ID | Label-encoded |
---|---|
1 | 0 |
2 | 1 |
3 | 2 |
4 | 0 |
5 | 3 |
6 | 0 |
# Import LabelEncoder from sklearn.preprocessing import LabelEncoder
# Create a LabelEncoder object le = LabelEncoder()
# Encode a categorical feature df['cat_encoded'] = le.fit_transform(df['cat'])
ID cat cat_encoded
0 1 A 0
1 2 B 1
2 3 C 2
3 4 A 0
ID | Categorical feature |
---|---|
1 | A |
2 | B |
3 | C |
4 | A |
5 | D |
6 | A |
ID | Cat == A | Cat == B | Cat == C | Cat == D |
---|---|---|---|---|
1 | 1 | 0 | 0 | 0 |
2 | 0 | 1 | 0 | 0 |
3 | 0 | 0 | 1 | 0 |
4 | 1 | 0 | 0 | 0 |
5 | 0 | 0 | 0 | 1 |
6 | 1 | 0 | 0 | 0 |
# Create One-Hot encoded features ohe = pd.get_dummies(df['cat'], prefix='ohe_cat')
# Drop the initial feature df.drop('cat', axis=1, inplace=True)
# Concatenate OHE features to the dataframe df = pd.concat([df, ohe], axis=1)
ID ohe_cat_A ohe_cat_B ohe_cat_C ohe_cat_D
0 1 1 0 0 0
1 2 0 1 0 0
2 3 0 0 1 0
3 4 1 0 0 0
# DataFrame with a binary feature
binary_feature
binary_feat
0 Yes
1 No
le = LabelEncoder()
binary_feature['binary_encoded'] = le.fit_transform(binary_feature['binary_feat'])
binary_feat binary_encoded
0 Yes 1
1 No 0
Winning a Kaggle Competition in Python