Menangani masalah data lain

Rekayasa Fitur untuk Machine Learning di Python

Robert O'Callaghan

Director of Data Science, Ordergroove

Karakter bermasalah

print(df['RawSalary'].dtype)
dtype('O')
Rekayasa Fitur untuk Machine Learning di Python

Karakter bermasalah

print(df['RawSalary'].head())
0          NaN
1    70,841.00
2          NaN
3    21,426.00
4    41,671.00
Name: RawSalary, dtype: object
Rekayasa Fitur untuk Machine Learning di Python

Menangani karakter bermasalah

df['RawSalary'] = df['RawSalary'].str.replace(',', '')
df['RawSalary'] = df['RawSalary'].astype('float')
Rekayasa Fitur untuk Machine Learning di Python

Menemukan karakter asing lain

coerced_vals = pd.to_numeric(df['RawSalary'], 
                             errors='coerce')
Rekayasa Fitur untuk Machine Learning di Python

Menemukan karakter asing lain

print(df[coerced_vals.isna()].head())
0           NaN
2           NaN
4     $51408.00
Name: RawSalary, dtype: object
Rekayasa Fitur untuk Machine Learning di Python

Menggabungkan method

df['column_name'] = df['column_name'].method1()
df['column_name'] = df['column_name'].method2()
df['column_name'] = df['column_name'].method3()

Sama dengan:

df['column_name'] = df['column_name']\
                     .method1().method2().method3()
Rekayasa Fitur untuk Machine Learning di Python

Silakan perbaiki karakter buruk!

Rekayasa Fitur untuk Machine Learning di Python

Preparing Video For Download...