Anomaly Detection in Python
Bekhruz (Bex) Tuychiev
Kaggle Master, Data Science Content Creator
Multivariate anomalies:
iTrees:
Points are outliers:
import pandas as pd
airbnb_df = pd.read_csv("airbnb.csv")
airbnb_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 6 columns):
# Column Non-Null Count Dtype
0 minimum_nights 10000 non-null int64
1 number_of_reviews 10000 non-null int64
2 reviews_per_month 10000 non-null float64
3 calculated_host_listings_count 10000 non-null int64
4 availability_365 10000 non-null int64
5 price 10000 non-null int64
dtypes: float64(1), int64(5)
from pyod.models.iforest import IForest
iforest = IForest() labels = iforest.fit_predict(airbnb_df) print(labels)
array([0, 0, 0, ..., 1, 0, 0])
outliers = airbnb_df[labels == 1]
print(outliers.shape)
(1000, 6)
Anomaly Detection in Python