Testing the extremes with Grubbs' test

Introduction to Anomaly Detection in R

Alastair Rushworth

Data Scientist

Visual assessment is not always reliable!

boxplot(temperature, ylab = "Celsius")

Boxplots for outlying points

Introduction to Anomaly Detection in R

Grubbs' test

  • Statistical test to decide if a point is outlying
  • Assumes the data are normally distributed
  • Requires checking the normality assumption first
Introduction to Anomaly Detection in R

Checking normality with a histogram

hist(temperature, breaks = 6)

Histogram showing data distribution

Symmetrical & bell shaped?

Introduction to Anomaly Detection in R

Running Grubbs' test

Use the grubbs.test() function:

grubbs.test(temperature)
    Grubbs test for one outlier
data:  temp
G = 3.07610, U = 0.41065, p-value = 0.001796
alternative hypothesis: highest value 30 is an outlier
Introduction to Anomaly Detection in R

Interpreting the p-value

grubbs.test(temperature)
    Grubbs test for one outlier

data:  temperature
G = 3.07610, U = 0.41065, p-value = 0.001796
alternative hypothesis: highest value 30 is an outlier

  p-value

  • Near 0 - stronger evidence of an outlier
  • Near 1 - weaker evidence of an outlier
Introduction to Anomaly Detection in R

Get the row index of an outlier

Location of the maximum

which.max(weights)
5

Location of the minimum

which.min(temperature)
12
Introduction to Anomaly Detection in R

Let's practice!

Introduction to Anomaly Detection in R

Preparing Video For Download...