Case Study: Exploratory Data Analysis in R
Dave Robinson
Chief Data Scientist, DataCamp
votes_processed
# A tibble: 353,547 × 6
rcid session vote ccode year country
<dbl> <dbl> <dbl> <int> <dbl> <chr>
1 46 2 1 2 1947 United States
2 46 2 1 20 1947 Canada
3 46 2 1 40 1947 Cuba
4 46 2 1 41 1947 Haiti
5 46 2 1 42 1947 Dominican Republic
6 46 2 1 70 1947 Mexico
7 46 2 1 90 1947 Guatemala
8 46 2 1 91 1947 Honduras
9 46 2 1 92 1947 El Salvador
10 46 2 1 93 1947 Nicaragua
# ... with 353,537 more rows
descriptions
# A tibble: 2,589 × 10
rcid session date unres me nu di hr co ec
<dbl> <dbl> <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 46 2 1947-09-04 R/2/299 0 0 0 0 0 0
2 47 2 1947-10-05 R/2/355 0 0 0 1 0 0
3 48 2 1947-10-06 R/2/461 0 0 0 0 0 0
4 49 2 1947-10-06 R/2/463 0 0 0 0 0 0
5 50 2 1947-10-06 R/2/465 0 0 0 0 0 0
6 51 2 1947-10-02 R/2/561 0 0 0 0 1 0
7 52 2 1947-11-06 R/2/650 0 0 0 0 1 0
8 53 2 1947-11-06 R/2/651 0 0 0 0 1 0
9 54 2 1947-11-06 R/2/651 0 0 0 0 1 0
10 55 2 1947-11-06 R/2/667 0 0 0 0 1 0
# ... with 2,579 more rows
votes_processed %>%
inner_join(descriptions, by = c("rcid", "session"))
# A tibble: 353,547 × 14
rcid session vote ccode year country date unres me
<dbl> <dbl> <dbl> <int> <dbl> <chr> <dttm> <chr> <dbl>
1 46 2 1 2 1947 United States 1947-09-04 R/2/299 0
2 46 2 1 20 1947 Canada 1947-09-04 R/2/299 0
3 46 2 1 40 1947 Cuba 1947-09-04 R/2/299 0
4 46 2 1 41 1947 Haiti 1947-09-04 R/2/299 0
5 46 2 1 42 1947 Dominican Republic 1947-09-04 R/2/299 0
6 46 2 1 70 1947 Mexico 1947-09-04 R/2/299 0
7 46 2 1 90 1947 Guatemala 1947-09-04 R/2/299 0
8 46 2 1 91 1947 Honduras 1947-09-04 R/2/299 0
9 46 2 1 92 1947 El Salvador 1947-09-04 R/2/299 0
10 46 2 1 93 1947 Nicaragua 1947-09-04 R/2/299 0
# ... with 353,537 more rows, and 5 more variables: nu <dbl>, di <dbl>,
# hr <dbl>, co <dbl>, ec <dbl>
Case Study: Exploratory Data Analysis in R