Stats: sum and quantile

Intermediate Data Visualization with ggplot2

Rick Scavetta

Founder, Scavetta Academy

Recall from course 1

Cause of Over-plotting Solutions
1. Large datasets Alpha-blending, hollow circles, point size
2. Aligned values on a single axis As above, plus change position
3. Low-precision data Position: jitter
4. Integer data Position: jitter
Intermediate Data Visualization with ggplot2

Plot counts to overcome over-plotting

Cause of Over-plotting Solutions Here...
1. Large datasets Alpha-blending, hollow circles, point size
2. Aligned values on a single axis As above, plus change position
3. Low-precision data Position: jitter geom_count()
4. Integer data Position: jitter geom_count()
Intermediate Data Visualization with ggplot2

Low precision (& integer) data

p <- ggplot(iris, aes(Sepal.Length, 
                      Sepal.Width))

p + geom_point()

Intermediate Data Visualization with ggplot2

Jittering may give a wrong impressions

p + geom_jitter(alpha = 0.5,
                width = 0.1,
                height = 0.1)

Intermediate Data Visualization with ggplot2

geom_count()

p + 
  geom_count()

Intermediate Data Visualization with ggplot2

The geom/stat connection

geom_ stat_
geom_count() stat_sum()
Intermediate Data Visualization with ggplot2

stat_sum()

p + 
  stat_sum()

Intermediate Data Visualization with ggplot2

Over-plotting can still be a problem!

ggplot(iris, aes(Sepal.Length,
                 Sepal.Width, 
                 color = Species)) + 
  geom_count(alpha = 0.4)

Intermediate Data Visualization with ggplot2

geom_quantile()

ggplot(iris, aes(Sepal.Length,
                 Sepal.Width, 
                 color = Species)) + 
  geom_count(alpha = 0.4)
Intermediate Data Visualization with ggplot2

Dealing with heteroscedasticity

library(AER)
data(Journals)

p <- ggplot(Journals, 
            aes(log(price/citations), 
                log(subs))) +
  geom_point(alpha = 0.5) +
  labs(...)

p

Intermediate Data Visualization with ggplot2

Using geom_quantiles

p +
  geom_quantile(quantiles = 
                c(0.05, 0.50, 0.95))

Intermediate Data Visualization with ggplot2

The geom/stat connection

geom_ stat_
geom_count() stat_sum()
geom_quantile() stat_quantile()
Intermediate Data Visualization with ggplot2

Ready for exercises!

Intermediate Data Visualization with ggplot2

Preparing Video For Download...