Box plots and IQR

Anomaly Detection in Python

Bekhruz (Bex) Tuychiev

Kaggle Master, Data Science Content Creator

Boxplots recap

  • Boxplots:
    • Are enhanced visual versions of the 5-number summary
    • Show data locality, spread and skewness
    • Indicate the presence of outliers

A sample boxplot with annotations.

Anomaly Detection in Python

Boxplot components

A boxplot with its 25th, 75th percentiles and the box annotated with arrows and accompanying text.

Anomaly Detection in Python

Whiskers

Another boxplot with more components annotated: the median, a sample outlier, lower and upper outlier limits.

Anomaly Detection in Python

Inter Quartile Range (IQR)

  • IQR - Inter Quartile Range
  • IQR = Q3 - Q1
  • Whisker lengths - a combination of IQR and a factor

The same boxplot from the previous slide but smaller.

Anomaly Detection in Python

Calculating whisker lengths

  • Factor = 1.5
  • Lower limit: $Q1 - 1.5 * IQR$
  • Upper limit: $Q3 + 1.5 * IQR$

The same boxplot from the previous slide.

Anomaly Detection in Python

Drawing boxplots

import matplotlib.pyplot as plt

plt.boxplot(sales)
plt.xlabel("Product sales")

A boxplot of product sales with many outliers forming a black line above the upper outlier limit.

Anomaly Detection in Python

Controlling whisker lengths

plt.boxplot(sales, whis=2.5)
plt.xlabel("Product sales")

A boxplot of product sales with fewer outliers above the upper whisker.

Anomaly Detection in Python

IQR in code

# Calculate the percentiles
q1 = sales.quantile(0.25)
q3 = sales.quantile(0.75)


# Calculate IQR IQR = q3 - q1 # Set Multiplying factor factor = 2.5
Anomaly Detection in Python

Finding outliers with IQR

# Calculate the limits
lower_limit = q1 - (IQR * factor)


upper_limit = q3 + (IQR * factor)
# Create masks
is_lower = sales < lower_limit
is_upper = sales > upper_limit


# Filter outliers = sales[is_lower | is_upper] # Print the # of outliers print(len(outliers))
29
Anomaly Detection in Python

The flexibility of the method

  • Boxplots and IQR filtering offer much flexibility
  • Allows creating custom rules for marking outliers
Anomaly Detection in Python

Let's practice!

Anomaly Detection in Python

Preparing Video For Download...