Menghapus outlier

Rekayasa Fitur untuk Machine Learning di Python

Robert O'Callaghan

Director of Data Science, Ordergroove

Apa itu outlier?

Gambar distribusi

Rekayasa Fitur untuk Machine Learning di Python

Deteksi berbasis kuantil

Rekayasa Fitur untuk Machine Learning di Python

Kuantil di Python

q_cutoff = df['col_name'].quantile(0.95)

mask = df['col_name'] < q_cutoff

trimmed_df = df[mask]
Rekayasa Fitur untuk Machine Learning di Python

Deteksi berbasis simpangan baku

Rekayasa Fitur untuk Machine Learning di Python

Deteksi simpangan baku di Python

mean = df['col_name'].mean()
std = df['col_name'].std()

cut_off = std * 3 lower, upper = mean - cut_off, mean + cut_off
new_df = df[(df['col_name'] < upper) & (df['col_name'] > lower)]
Rekayasa Fitur untuk Machine Learning di Python

Ayo berlatih!

Rekayasa Fitur untuk Machine Learning di Python

Preparing Video For Download...