Exploratory Data Analysis in R
Andrew Bray
Assistant Professor, Reed College
Center
Variability
Shape
Outliers
life <- life %>%
mutate(is_outlier = income > 75000)
life %>%
filter(is_outlier) %>%
arrange(desc(income))
# A tibble: 45 x 6
state county expectancy income west_coast is_outlier
<chr> <chr> <dbl> <int> <lgl> <lgl>
1 Wyoming Teton County 82.110 194861 FALSE TRUE
2 New York New York County 81.675 156708 FALSE TRUE
3 Texas Shackelford County 75.400 132989 FALSE TRUE
4 Colorado Pitkin County 82.990 126137 FALSE TRUE
5 Nebraska Wheeler County 79.180 125171 FALSE TRUE
6 California Marin County 83.230 109076 TRUE TRUE
7 Nebraska Kearney County 79.630 108975 FALSE TRUE
8 Texas McMullen County 77.320 107627 FALSE TRUE
9 Massachusetts Nantucket County 80.325 107341 FALSE TRUE
10 Texas Midland County 77.830 106588 FALSE TRUE
# ... with 35 more rows
life %>% filter(!is_outlier) %>%
ggplot(aes(x = income, fill = west_coast)) + geom_density(alpha = .3)
Exploratory Data Analysis in R