Handling outliers

Analisi predittiva intermedia in Python

Nele Verbiest

Senior Data Scientist @PythonPredictions

Influence of outliers on predictive models

Analisi predittiva intermedia in Python

Causes of outliers

  • Human errors
  • Measuring errors
  • Truly extreme values
  • ...
Analisi predittiva intermedia in Python

Winsorization concept

Analisi predittiva intermedia in Python

Winsorization in Python

from scipy.stats.mstats import winsorize
basetable["variable_winsorized"] = 
     winsorize(
       basetable["variable"], 
       limits = [0.05,0.01])
Analisi predittiva intermedia in Python

Standard deviation method concept

Analisi predittiva intermedia in Python

Standard deviation method in Python

mean_age = basetable["age"].mean()
sd_age = basetable["age"].std()
lower_limit = mean_age - 3*sd_age
upper_limit = mean_age + 3*sd_age
basetable["age_no_outliers"] = pd.Series(
                                    [min(max(a,lower_limit), upper_limit) 
                                     for a in basetable["age"]]
                                )
Analisi predittiva intermedia in Python

Let's practice!

Analisi predittiva intermedia in Python

Preparing Video For Download...