Feature Engineering for Machine Learning in Python
Robert O'Callaghan
Director of Data Science, Ordergroove
pd.get_dummies(df, columns=['Country'],
prefix='C')
C_France C_India C_UK C_USA
0 0 1 0 0
1 0 0 0 1
2 0 0 1 0
3 0 0 1 0
4 1 0 0 0
pd.get_dummies(df, columns=['Country'],
drop_first=True, prefix='C')
C_India C_UK C_USA
0 1 0 0
1 0 0 1
2 0 1 0
3 0 1 0
4 0 0 0
Index | Sex |
---|---|
0 | Male |
1 | Female |
2 | Male |
Index | Male | Female |
---|---|---|
0 | 1 | 0 |
1 | 0 | 1 |
2 | 1 | 0 |
Index | Male |
---|---|
0 | 1 |
1 | 0 |
2 | 1 |
counts = df['Country'].value_counts()
print(counts)
'USA' 8
'UK' 6
'India' 2
'France' 1
Name: Country, dtype: object
mask = df['Country'].isin(counts[counts < 5].index)
df['Country'][mask] = 'Other'
print(pd.value_counts(colors))
'USA' 8
'UK' 6
'Other' 3
Name: Country, dtype: object
Feature Engineering for Machine Learning in Python