Distributions: part one

Visualization Best Practices in R

Nick Strayer

Instructor

What is distribution data?

  • Multiple 'observations'
  • Usually a sample of some population

Visualization Best Practices in R

Why distributions are important

  • Data collection or cleaning errors can become apparent
  • Could indicate the need to control for a variable in a model
  • Being true to the data

Visualization Best Practices in R

Standard plots

Histogram

  • Good for one distribution at a time
  • This chapter

boxplot

  • For comparing multiple distributions
  • Next chapter

Visualization Best Practices in R

Maryland speeding data

md_speeding

Visualization Best Practices in R

Making a histogram in ggplot2

  • geom_histogram()
  • Automatically bins data for you
  • Just supply x aesthetic
md_speeding %>% 
  filter(vehicle_color == 'BLUE') %>% 
  ggplot(aes(x = speed)) +
  geom_histogram()
Visualization Best Practices in R

Visualization Best Practices in R

Let's make some histograms!

Visualization Best Practices in R

Preparing Video For Download...