Conclusion

Exploratory Data Analysis in R

Andrew Bray

Assistant Professor, Reed College

Pie chart vs. bar chart

ch4_4.002.png

Exploratory Data Analysis in R

Faceting vs. stacking

ch4_4.003.png

Exploratory Data Analysis in R

Histogram

ggplot(data, aes(x = var1)) +
    geom_histogram()

ch4_4.004.png

Exploratory Data Analysis in R

Density plot

cars %>%
  filter(eng_size < 2.0) %>%
  ggplot(aes(x = hwy_mpg)) +
  geom_density()

ch4_4.005.png

Exploratory Data Analysis in R

Side-by-side box plots

ggplot(common_cyl, aes(x = as.factor(ncyl), y = city_mpg)) +
  geom_boxplot()
Warning message:
Removed 11 rows containing non-finite values (stat_boxplot).

ch4_4.006.png

Exploratory Data Analysis in R

Center: mean, median, mode

x
76 78 75 74 76 72 74 73 73 75 74
table(x)
x
72 73 74 75 76 78 
 1  2  3  2  2  1

ch4_4.007.png

Exploratory Data Analysis in R

Shape of income

ggplot(life, aes(x = income, fill = west_coast)) +
  geom_density(alpha = .3)
ggplot(life, aes(x = log(income), fill = west_coast)) +
  geom_density(alpha = .3)

ch4_4.008.png

Exploratory Data Analysis in R

With group_by()

life %>%
  slice(240:247) %>%
  group_by(west_coast) %>%
  summarize(mean(expectancy))
# A tibble: 2 x 2
  west_coast mean(expectancy)
       <lgl           <dbl>
1      FALSE         79.26125
2       TRUE         79.29375

ch4_4.009.png

Exploratory Data Analysis in R

Spam and exclamation points

email %>%
  mutate(zero = exclaim_mess == 0) %>%
  ggplot(aes(x = zero, fill = spam)) +
  geom_bar()

ch4_4.010.png

Exploratory Data Analysis in R

Spam and images

email %>%
  mutate(has_image = image 0) %>%
  ggplot(aes(x = as.factor(has_image), fill = spam)) +
  geom_bar(position = "fill")

ch4_4.011.png

Exploratory Data Analysis in R

Let's practice!

Exploratory Data Analysis in R

Preparing Video For Download...