Anomaly Detection in Python
Bekhruz (Bex) Tuychiev
Kaggle Master, Data Science Content Creator
import pandas as pd
males = pd.read_csv("ansur_male.csv")
males.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4082 entries, 0 to 4081
Data columns (total 95 columns):
# Column Non-Null Count Dtype
0 abdominalextensiondepthsitting 4082 non-null int64
1 acromialheight 4082 non-null int64
2 acromionradialelength 4082 non-null int64
3 anklecircumference 4082 non-null int64
4 axillaheight 4082 non-null int64
...
from pyod.models.knn import KNN
knn = KNN(contamination=0.01, n_jobs=-1)
knn.fit(males)
probs = knn.predict_proba(males)
# Use 55% threshold for filtering is_outlier = probs[:, 1] > 0.55 # Isolate the outliers outliers = males[is_outlier] len(outliers)
13
# k=20 when contamination is <=10%
knn = KNN(n_neighbors=20, n_jobs=-1)
knn.fit(males)
probs = knn.predict_proba(males)
is_outlier = probs[:, 1] > .55
outliers = males[is_outlier]
len(outliers)
15
Anomaly Detection in Python