Anomaly Detection in Python
Bekhruz (Bex) Tuychiev
Kaggle Master, Data Science Content Creator
Example:
The Empirical Rule:
Outliers:
from scipy.stats import zscore
scores = zscore(sales) scores[:5]
0 0.910601
1 -1.018440
2 -0.049238
3 0.849103
4 -0.695373
is_over_3 = np.abs(scores) > 3
is_over_3[:5]
0 False
1 False
2 False
3 True
4 False
outliers = sales[is_over_3]
print(len(outliers))
90
from scipy.stats import median_abs_deviation mad_score = median_abs_deviation(sales)
mad_score
1081.925
from pyod.models.mad import MAD # threshold defaults to 3.5 mad = MAD(threshold=3.5)
# Reshape sales sales_reshaped = sales.values.reshape(-1, 1)
labels = mad.fit_predict(sales_reshaped)
print(labels.sum())
83
Anomaly Detection in Python