Introduction to Anomaly Detection in R
Alastair Rushworth
Data Scientist
Class of a single column
class(sat$V1)
"numeric"
Class of all columns
sapply(X = sat, FUN = class)
label V1 V2 V3 V4 V5 V6 high_low
"numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "character"
Encode categorical features as factor
sat$high_low <- as.factor(sat$high_low)
class(sat$high_low)
"factor"
Train isolation forest
sat_for <- iForest(sat[, -1], nt = 100)
Gower distance measures distance between points with categorical & numeric features
library(cluster)
sat_dist <- daisy(sat[, -1], metric = "gower")
Pass sat_dist
to lof
sat_lof <- lof(sat_dist, k = 10)
sat_distmat <- as.matrix(sat_dist)
range(sat_distmat)
0.0000000 0.8680774
Introduction to Anomaly Detection in R