Joining questions and answers

Joining Data with dplyr

Chris Cardillo

Data Scientist

The answers table

answers
# A tibble: 380,643 x 4
         id creation_date question_id score
      <int> <date>              <int> <int>
 1 39143713 2016-08-25       39143518     3
 2 39143869 2016-08-25       39143518     1
 3 39143935 2016-08-25       39142481     0
 4 39144014 2016-08-25       39024390     0
 5 39144252 2016-08-25       39096741     6
 6 39144375 2016-08-25       39143885     5
 7 39144430 2016-08-25       39144077     0
 8 39144625 2016-08-25       39142728     1
 9 39144794 2016-08-25       39043648     0
10 39145033 2016-08-25       39133170     1
# … with 380,633 more rows
Joining Data with dplyr

The question ID

questions %>%
  inner_join(answers, by = c("id" = "question_id"))
# A tibble: 380,643 x 6
         id creation_date.x score.x     id.y creation_date.y score.y
      <int> <date>            <int>    <int> <date>            <int>
 1 22557677 2014-03-21            1 22560670 2014-03-21            2
 2 22557707 2014-03-21            2 22558516 2014-03-21            1
 3 22557707 2014-03-21            2 22558726 2014-03-21            4
 4 22558084 2014-03-21            2 22558085 2014-03-21            0
 5 22558084 2014-03-21            2 22606545 2014-03-24            1
 6 22558084 2014-03-21            2 22610396 2014-03-24            5
 7 22558084 2014-03-21            2 34374729 2015-12-19            0
 8 22558395 2014-03-21            2 22559327 2014-03-21            1
 9 22558395 2014-03-21            2 22560102 2014-03-21            2
10 22558395 2014-03-21            2 22560288 2014-03-21            2
# … with 380,633 more rows
Joining Data with dplyr

The joining verbs

join-review

Joining Data with dplyr

Let's practice!

Joining Data with dplyr

Preparing Video For Download...