Omgaan met andere dataproblemen

Feature engineering voor Machine Learning in Python

Robert O'Callaghan

Director of Data Science, Ordergroove

Foute tekens

print(df['RawSalary'].dtype)
dtype('O')
Feature engineering voor Machine Learning in Python

Foute tekens

print(df['RawSalary'].head())
0          NaN
1    70,841.00
2          NaN
3    21,426.00
4    41,671.00
Name: RawSalary, dtype: object
Feature engineering voor Machine Learning in Python

Omgaan met foute tekens

df['RawSalary'] = df['RawSalary'].str.replace(',', '')
df['RawSalary'] = df['RawSalary'].astype('float')
Feature engineering voor Machine Learning in Python

Andere losse tekens vinden

coerced_vals = pd.to_numeric(df['RawSalary'], 
                             errors='coerce')
Feature engineering voor Machine Learning in Python

Andere losse tekens vinden

print(df[coerced_vals.isna()].head())
0           NaN
2           NaN
4     $51408.00
Name: RawSalary, dtype: object
Feature engineering voor Machine Learning in Python

Methodes ketenen

df['column_name'] = df['column_name'].method1()
df['column_name'] = df['column_name'].method2()
df['column_name'] = df['column_name'].method3()

Hetzelfde als:

df['column_name'] = df['column_name']\
                     .method1().method2().method3()
Feature engineering voor Machine Learning in Python

Aan de slag met foute tekens!

Feature engineering voor Machine Learning in Python

Preparing Video For Download...