Local Outlier Factor

Anomaly Detection in Python

Bekhruz (Bex) Tuychiev

Data Science Writer

What is Local Outlier Factor (LOF)?

Features

  • Density-based algorithm
  • Proposed in 2000
  • Works well with moderately high-dimensional data
  • Can be faster than KNN and Isolation Forest
Anomaly Detection in Python

How does LOF work?

  • Points are classified using a score called local outlier factor (LOF)
  • LOF is based on the concept of local density
  • Locality is chosen by setting a value to n_neighbors parameter
  • Points with lower density are classified as outliers

LOF algorithm visualized

Anomaly Detection in Python

The importance of locality

  • The word local is key
  • LOF score is only compared to the score of nearby points
Anomaly Detection in Python

LOF visualized

LOF algorithm visualized

Anomaly Detection in Python

LOF visualized

LOF algorithm visualized

Anomaly Detection in Python

LOF visualized

LOF algorithm visualized

Anomaly Detection in Python

Transformed dataset

import pandas as pd

males_transformed = pd.read_csv("males_transformed.csv")

males_transformed.head()
   abdominalextensiondepthsitting  acromialheight  acromionradialelength  ...
0                        0.365531        0.427976               0.113152   
1                       -0.492137       -0.724973              -0.529301   
2                        0.857097       -0.146048               0.357496   
3                       -0.462610       -1.521525              -1.467860   
4                       -0.026349        2.151957               1.985876
Anomaly Detection in Python

LOF in action

from pyod.models.lof import LOF


# Fit lof = LOF(n_neighbors=20, metric="manhattan") lof.fit(males_transformed) print(lof.labels_)
array([0, 0, 0, ..., 0, 0, 1])
Anomaly Detection in Python

Filtering in LOF

# Isolate the outliers
probs = lof.predict_proba(males_transformed)

is_outlier = probs[:, 1] > 0.55
outliers = males_transformed[is_outlier]

len(outliers)
2
Anomaly Detection in Python

LOF details

  • n_neighbors is the most important
  • Rule of thumb to tune n_neigbors:
    • 20 neighbors for <10% contamination
  • No aggregation allowed:
    • method is always largest
Anomaly Detection in Python

LOF drawbacks

  • Hard to interpret
  • LOF score combines:
    • distance between points
    • reachability distance
    • many other components
  • No specific range of LOF score for outliers
Anomaly Detection in Python

Let's practice!

Anomaly Detection in Python

Preparing Video For Download...