Feature Engineering for Machine Learning in Python
Robert O'Callaghan
Director of Data Science, Ordergroove


pd.get_dummies(df, columns=['Country'],
prefix='C')
C_France C_India C_UK C_USA
0 0 1 0 0
1 0 0 0 1
2 0 0 1 0
3 0 0 1 0
4 1 0 0 0
pd.get_dummies(df, columns=['Country'],
drop_first=True, prefix='C')
C_India C_UK C_USA
0 1 0 0
1 0 0 1
2 0 1 0
3 0 1 0
4 0 0 0
| Index | Sex |
|---|---|
| 0 | Male |
| 1 | Female |
| 2 | Male |
| Index | Male | Female |
|---|---|---|
| 0 | 1 | 0 |
| 1 | 0 | 1 |
| 2 | 1 | 0 |
| Index | Male |
|---|---|
| 0 | 1 |
| 1 | 0 |
| 2 | 1 |
counts = df['Country'].value_counts()
print(counts)
'USA' 8
'UK' 6
'India' 2
'France' 1
Name: Country, dtype: object
mask = df['Country'].isin(counts[counts < 5].index)df['Country'][mask] = 'Other'print(pd.value_counts(colors))
'USA' 8
'UK' 6
'Other' 3
Name: Country, dtype: object
Feature Engineering for Machine Learning in Python