Working with categorical features

Introduction to Anomaly Detection in R

Alastair Rushworth

Data Scientist

Checking column classes

Class of a single column

class(sat$V1)
"numeric"

Class of all columns

sapply(X = sat, FUN = class)
      label          V1          V2          V3         V4          V5          V6    high_low
  "numeric"   "numeric"   "numeric"   "numeric"  "numeric"   "numeric"   "numeric" "character"
Introduction to Anomaly Detection in R

Isolation forest

Encode categorical features as factor

sat$high_low <- as.factor(sat$high_low)

class(sat$high_low)
"factor"

Train isolation forest

sat_for <- iForest(sat[, -1], nt = 100)
Introduction to Anomaly Detection in R

LOF with factors

Gower distance measures distance between points with categorical & numeric features

 

library(cluster)
sat_dist <- daisy(sat[, -1], metric = "gower")

Pass sat_dist to lof

sat_lof <- lof(sat_dist, k = 10)

Introduction to Anomaly Detection in R

Exploring Gower distance matrix

  • Convert object to matrix
sat_distmat <- as.matrix(sat_dist)

 

  • Find max and min interpoint distances
range(sat_distmat)
0.0000000 0.8680774
Introduction to Anomaly Detection in R

Let's practice!

Introduction to Anomaly Detection in R

Preparing Video For Download...