Feature Engineering for Machine Learning in Python
Robert O'Callaghan
Director of Data Science, Ordergroove
print(df['RawSalary'].dtype)
dtype('O')
print(df['RawSalary'].head())
0 NaN
1 70,841.00
2 NaN
3 21,426.00
4 41,671.00
Name: RawSalary, dtype: object
df['RawSalary'] = df['RawSalary'].str.replace(',', '')
df['RawSalary'] = df['RawSalary'].astype('float')
coerced_vals = pd.to_numeric(df['RawSalary'],
errors='coerce')
print(df[coerced_vals.isna()].head())
0 NaN
2 NaN
4 $51408.00
Name: RawSalary, dtype: object
df['column_name'] = df['column_name'].method1()
df['column_name'] = df['column_name'].method2()
df['column_name'] = df['column_name'].method3()
Same as:
df['column_name'] = df['column_name']\
.method1().method2().method3()
Feature Engineering for Machine Learning in Python