Handling outliers

Gevorderde voorspellende analyse in Python

Nele Verbiest

Senior Data Scientist @PythonPredictions

Influence of outliers on predictive models

Gevorderde voorspellende analyse in Python

Causes of outliers

  • Human errors
  • Measuring errors
  • Truly extreme values
  • ...
Gevorderde voorspellende analyse in Python

Winsorization concept

Gevorderde voorspellende analyse in Python

Winsorization in Python

from scipy.stats.mstats import winsorize
basetable["variable_winsorized"] = 
     winsorize(
       basetable["variable"], 
       limits = [0.05,0.01])
Gevorderde voorspellende analyse in Python

Standard deviation method concept

Gevorderde voorspellende analyse in Python

Standard deviation method in Python

mean_age = basetable["age"].mean()
sd_age = basetable["age"].std()
lower_limit = mean_age - 3*sd_age
upper_limit = mean_age + 3*sd_age
basetable["age_no_outliers"] = pd.Series(
                                    [min(max(a,lower_limit), upper_limit) 
                                     for a in basetable["age"]]
                                )
Gevorderde voorspellende analyse in Python

Let's practice!

Gevorderde voorspellende analyse in Python

Preparing Video For Download...