Interpreting the output of IForest

Anomaly Detection in Python

Bekhruz (Bex) Tuychiev

Kaggle Master, Data Science Content Creator

An alternative

from pyod.models.iforest import IForest

iforest = IForest(contamination=0.2, max_features=0.5, random_state=1)

iforest = iforest.fit(airbnb_df)


labels = iforest.labels_
print(labels)

array([0, 0, 0, ..., 1, 0, 0])

Predictions on new data

import numpy as np

new_data = [[34, 40, 0.44, 3, 2, 90]]

iforest.predict(new_data)

array([0])

Probability scores

all_probs = iforest.predict_proba(airbnb_df)
print(all_probs)

array([[0.71401381, 0.28598619],
       [0.75553703, 0.24446297],
       [0.6844169 , 0.3155831 ],
       ...,
       ])

print(all_probs.shape)

(10000, 2)

Outlier probability scores

outliers = airbnb_df[iforest.labels_ == 1]
outlier_probs = iforest.predict_proba(outliers)

print(outlier_probs[:10])

array([[0.51999538, 0.48000462],
       [0.61789522, 0.38210478],
       [0.61802032, 0.38197968],
       [0.35184434, 0.64815566],
       [0.57533286, 0.42466714],
       [0.59038933, 0.40961067],
       [0.57677613, 0.42322387],
       [0.54158826, 0.45841174],
       [0.49118093, 0.50881907],
       [0.21387357, 0.78612643]])

Abandoning contamination

# Fit to Airbnb
iforest = IForest(max_features=0.5, random_state=1)
iforest.fit(airbnb_df)

# Calculate probabilities
probs = iforest.predict_proba(airbnb_df)

# Propbs for outliers
outlier_probs = probs[:, 1]

Abandoning contamination

# Filter only when probability is higher than 65%
outliers = airbnb_df[outlier_probs >= 0.65]

print(len(outliers))

Let's practice!

Anomaly Detection in Python