Boxplot alternatives

Visualization Best Practices in R

Nick Strayer

Instructor

Limitations of the boxplot with jitter

  • Jostling points can only deal with so much overlap
  • Hard to get an idea of data density

Visualization Best Practices in R

What are some other options?

Beeswarm plots

Violin plots

Visualization Best Practices in R

Beeswarm plots

  • 'Smart' jittering
  • Individual points are clumped together as close to the axis as possible
  • Handily included as geom_beeswarm() in the ggbeeswarm package.
library(ggbeeswarm)
ggplot(data, aes(y = y, x =  group)) + 
  geom_beeswarm(color = 'steelblue')
Visualization Best Practices in R

Visualization Best Practices in R

Beeswarm pros

  • Individual data points
  • Distributional shape

Visualization Best Practices in R

Beeswarm cons

  • Get hard with lots of data
  • Arbitrary stacking

Visualization Best Practices in R

Violin plots

  • KDE reflected to be symmetric
  • Just replace geom_boxplot() with geom_violin().
ggplot(data, aes(y = y, x = group)) + 
  geom_violin(fill = 'steelblue') 
Visualization Best Practices in R

Visualization Best Practices in R

Violin pros

  • Every data point is heard
  • Not every data point is seen, so good for lots of data.

Visualization Best Practices in R

Violin cons

  • Kernel width choice
  • Not every data point is seen

Visualization Best Practices in R

Let's try some more advanced comparisons!

Visualization Best Practices in R

Preparing Video For Download...