Removing outliers

Feature Engineering for Machine Learning in Python

Robert O'Callaghan

Director of Data Science, Ordergroove

What are outliers?

Distribution image

Feature Engineering for Machine Learning in Python

Quantile based detection

Feature Engineering for Machine Learning in Python

Quantiles in Python

q_cutoff = df['col_name'].quantile(0.95)

mask = df['col_name'] < q_cutoff

trimmed_df = df[mask]
Feature Engineering for Machine Learning in Python

Standard deviation based detection

Feature Engineering for Machine Learning in Python

Standard deviation detection in Python

mean = df['col_name'].mean()
std = df['col_name'].std()

cut_off = std * 3 lower, upper = mean - cut_off, mean + cut_off
new_df = df[(df['col_name'] < upper) & (df['col_name'] > lower)]
Feature Engineering for Machine Learning in Python

Let's practice!

Feature Engineering for Machine Learning in Python

Preparing Video For Download...