Joining datasets

Case Study: Exploratory Data Analysis in R

Dave Robinson

Chief Data Scientist, DataCamp

Processed votes

votes_processed
# A tibble: 353,547 × 6
    rcid session  vote ccode  year            country
   <dbl>   <dbl> <dbl> <int> <dbl>              <chr>
1     46       2     1     2  1947      United States
2     46       2     1    20  1947             Canada
3     46       2     1    40  1947               Cuba
4     46       2     1    41  1947              Haiti
5     46       2     1    42  1947 Dominican Republic
6     46       2     1    70  1947             Mexico
7     46       2     1    90  1947          Guatemala
8     46       2     1    91  1947           Honduras
9     46       2     1    92  1947        El Salvador
10    46       2     1    93  1947          Nicaragua
# ... with 353,537 more rows
  • Each row is one roll call/country pair
Case Study: Exploratory Data Analysis in R

Descriptions dataset

descriptions
# A tibble: 2,589 × 10
    rcid session       date   unres    me    nu    di    hr    co    ec
   <dbl>   <dbl>     <dttm>   <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     46       2 1947-09-04 R/2/299     0     0     0     0     0     0
2     47       2 1947-10-05 R/2/355     0     0     0     1     0     0
3     48       2 1947-10-06 R/2/461     0     0     0     0     0     0
4     49       2 1947-10-06 R/2/463     0     0     0     0     0     0
5     50       2 1947-10-06 R/2/465     0     0     0     0     0     0
6     51       2 1947-10-02 R/2/561     0     0     0     0     1     0
7     52       2 1947-11-06 R/2/650     0     0     0     0     1     0
8     53       2 1947-11-06 R/2/651     0     0     0     0     1     0
9     54       2 1947-11-06 R/2/651     0     0     0     0     1     0
10    55       2 1947-11-06 R/2/667     0     0     0     0     1     0
# ... with 2,579 more rows
Case Study: Exploratory Data Analysis in R

inner_join()

votes_processed %>%
  inner_join(descriptions, by = c("rcid", "session"))
# A tibble: 353,547 × 14
    rcid session  vote ccode  year            country       date   unres    me
   <dbl>   <dbl> <dbl> <int> <dbl>              <chr>     <dttm>   <chr> <dbl>
1     46       2     1     2  1947      United States 1947-09-04 R/2/299     0
2     46       2     1    20  1947             Canada 1947-09-04 R/2/299     0
3     46       2     1    40  1947               Cuba 1947-09-04 R/2/299     0
4     46       2     1    41  1947              Haiti 1947-09-04 R/2/299     0
5     46       2     1    42  1947 Dominican Republic 1947-09-04 R/2/299     0
6     46       2     1    70  1947             Mexico 1947-09-04 R/2/299     0
7     46       2     1    90  1947          Guatemala 1947-09-04 R/2/299     0
8     46       2     1    91  1947           Honduras 1947-09-04 R/2/299     0
9     46       2     1    92  1947        El Salvador 1947-09-04 R/2/299     0
10    46       2     1    93  1947          Nicaragua 1947-09-04 R/2/299     0
# ... with 353,537 more rows, and 5 more variables: nu <dbl>, di <dbl>,
#   hr <dbl>, co <dbl>, ec <dbl>
Case Study: Exploratory Data Analysis in R

Let's practice!

Case Study: Exploratory Data Analysis in R

Preparing Video For Download...