Case Study: Exploratory Data Analysis in R
Dave Robinson
Chief Data Scientist, DataCamp
votes_joined %>%
select(rcid, session, vote, country, me:ec)
# A tibble: 353,547 × 10
rcid session vote country me nu di hr co ec
<dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 46 2 1 United States 0 0 0 0 0 0
2 46 2 1 Canada 0 0 0 0 0 0
3 46 2 1 Cuba 0 0 0 0 0 0
4 46 2 1 Haiti 0 0 0 0 0 0
5 46 2 1 Dominican Republic 0 0 0 0 0 0
6 46 2 1 Mexico 0 0 0 0 0 0
7 46 2 1 Guatemala 0 0 0 0 0 0
8 46 2 1 Honduras 0 0 0 0 0 0
9 46 2 1 El Salvador 0 0 0 0 0 0
10 46 2 1 Nicaragua 0 0 0 0 0 0
# ... with 353,537 more rows
library(tidyr)
votes_joined %>%
gather(topic, has_topic, me:ec)
# A tibble: 2,121,282 × 10
rcid session vote ccode year country date unres topic has_topic
<dbl> <dbl> <dbl> <int> <dbl> <chr> <dttm> <chr> <chr> <dbl>
1 46 2 1 2 1947 United States 1947-09-04 R/2/299 me 0
2 46 2 1 20 1947 Canada 1947-09-04 R/2/299 me 0
3 46 2 1 40 1947 Cuba 1947-09-04 R/2/299 me 0
4 46 2 1 41 1947 Haiti 1947-09-04 R/2/299 me 0
5 46 2 1 42 1947 Dominican Republic 1947-09-04 R/2/299 me 0
6 46 2 1 70 1947 Mexico 1947-09-04 R/2/299 me 0
7 46 2 1 90 1947 Guatemala 1947-09-04 R/2/299 me 0
8 46 2 1 91 1947 Honduras 1947-09-04 R/2/299 me 0
9 46 2 1 92 1947 El Salvador 1947-09-04 R/2/299 me 0
10 46 2 1 93 1947 Nicaragua 1947-09-04 R/2/299 me 0
# ... with 2,121,272 more rows
library(tidyr)
votes_joined %>%
gather(topic, is_topic, me:ec) %>%
filter(has_topic == 1)
# A tibble: 350,032 × 10
rcid session vote ccode year country date unres topic has_topic
<dbl> <dbl> <dbl> <int> <dbl> <chr> <dttm> <chr> <chr> <dbl>
1 77 2 1 2 1947 United States 1947-11-06 R/2/1424 me 1
2 77 2 1 20 1947 Canada 1947-11-06 R/2/1424 me 1
3 77 2 3 40 1947 Cuba 1947-11-06 R/2/1424 me 1
4 77 2 1 41 1947 Haiti 1947-11-06 R/2/1424 me 1
5 77 2 1 42 1947 Dominican Republic 1947-11-06 R/2/1424 me 1
6 77 2 2 70 1947 Mexico 1947-11-06 R/2/1424 me 1
7 77 2 1 90 1947 Guatemala 1947-11-06 R/2/1424 me 1
8 77 2 2 91 1947 Honduras 1947-11-06 R/2/1424 me 1
9 77 2 2 92 1947 El Salvador 1947-11-06 R/2/1424 me 1
10 77 2 1 93 1947 Nicaragua 1947-11-06 R/2/1424 me 1
# ... with 350,022 more rows
Case Study: Exploratory Data Analysis in R