Preprocessing for Machine Learning in Python
James Chapman
Curriculum Manager, DataCamp
user subscribed fav_color
0 1 y blue
1 2 n green
2 3 n orange
3 4 y green
print(users["subscribed"])
0 y
1 n
2 n
3 y
Name: subscribed, dtype: object
print(users[["subscribed", "sub_enc"]])
subscribed sub_enc
0 y 1
1 n 0
2 n 0
3 y 1
users["sub_enc"] = users["subscribed"].apply(lambda val: 1 if val == "y" else 0)
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder() users["sub_enc_le"] = le.fit_transform(users["subscribed"])
print(users[["subscribed", "sub_enc_le"]])
subscribed sub_enc_le
0 y 1
1 n 0
2 n 0
3 y 1
fav_color |
---|
blue |
green |
orange |
green |
Values: [blue, green, orange]
fav_color_enc |
---|
[1, 0, 0] |
[0, 1, 0] |
[0, 0, 1] |
[0, 1, 0] |
print(users["fav_color"])
0 blue
1 green
2 orange
3 green
Name: fav_color, dtype: object
print(pd.get_dummies(users["fav_color"]))
blue green orange
0 1 0 0
1 0 1 0
2 0 0 1
3 0 1 0
Preprocessing for Machine Learning in Python