Feature Engineering for Machine Learning in Python
Robert O'Callaghan
Director of Data Science, Ordergroove
SurveyDate ConvertedSalary Hobby ... \
0 2/28/18 20:20 NaN Yes ...
1 6/28/18 13:26 70841.0 Yes ...
2 6/6/18 3:37 NaN No ...
3 5/9/18 1:06 21426.0 Yes ...
4 4/12/18 22:41 41671.0 Yes ...
# Drop all rows with at least one missing values
df.dropna(how='any')
# Drop rows with missing values in a specific column
df.dropna(subset=['VersionControl'])
# Replace missing values in a specific column
# with a given string
df['VersionControl'].fillna(
value='None Given', inplace=True
)
# Record where the values are not missing
df['SalaryGiven'] = df['ConvertedSalary'].notnull()
# Drop a specific column
df.drop(columns=['ConvertedSalary'])
Feature Engineering for Machine Learning in Python