The United Nations Voting Dataset

Case Study: Exploratory Data Analysis in R

Dave Robinson

Chief Data Scientist, DataCamp

UN Voting Dataset

rcid session vote ccode
46 2 1 2
46 2 1 20
46 2 9 31
46 2 1 40
46 2 1 41
46 2 1 42
46 2 1 51
46 2 9 52
46 2 9 53
1 Erik Voeten, "Data and Analyses of Voting in the UN General Assembly"
Case Study: Exploratory Data Analysis in R

UN Voting Dataset

rcid session vote ccode
46 2 1 2 Each row has a country-vote pair
46 2 1 20
46 2 9 31
46 2 1 40
46 2 1 41
46 2 1 42
46 2 1 51
46 2 9 52
46 2 9 53
1 Erik Voeten, "Data and Analyses of Voting in the UN General Assembly"
Case Study: Exploratory Data Analysis in R

UN Voting Dataset

rcid session vote ccode
46 2 1 2 Each row has a country-vote pair
46 2 1 20
46 2 9 31 rcid = "Roll call ID"
46 2 1 40
46 2 1 41
46 2 1 42
46 2 1 51
46 2 9 52
46 2 9 53
1 Erik Voeten, "Data and Analyses of Voting in the UN General Assembly"
Case Study: Exploratory Data Analysis in R

UN Voting Dataset

rcid session vote ccode
46 2 1 2 Each row has a country-vote pair
46 2 1 20
46 2 9 31 rcid = Roll call ID
46 2 1 40
46 2 1 41 session = Session year
46 2 1 42
46 2 1 51
46 2 9 52
46 2 9 53
1 Erik Voeten, "Data and Analyses of Voting in the UN General Assembly"
Case Study: Exploratory Data Analysis in R

UN Voting Dataset

rcid session vote ccode
46 2 1 2 Each row has a country-vote pair
46 2 1 20
46 2 9 31 rcid = Roll call ID
46 2 1 40
46 2 1 41 session = Session year
46 2 1 42
46 2 1 51 vote = Vote code
46 2 9 52
46 2 9 53
1 Erik Voeten, "Data and Analyses of Voting in the UN General Assembly"
Case Study: Exploratory Data Analysis in R

UN Voting Dataset

rcid session vote ccode
46 2 1 2 Each row has a country-vote pair
46 2 1 20
46 2 9 31 rcid = Roll call ID
46 2 1 40
46 2 1 41 session = Session year
46 2 1 42
46 2 1 51 vote = Vote code
46 2 9 52
46 2 9 53 ccode = Country code
1 Erik Voeten, "Data and Analyses of Voting in the UN General Assembly"
Case Study: Exploratory Data Analysis in R

Votes in dplyr

# Load dplyr package
library(dplyr)
votes
# A tibble: 508,929 × 4
    rcid session  vote ccode
   <dbl>   <dbl> <dbl> <int>
1     46       2     1     2
2     46       2     1    20
3     46       2     9    31
4     46       2     1    40
5     46       2     1    41
6     46       2     1    42
7     46       2     9    51
8     46       2     9    52
9     46       2     9    53
10    46       2     9    54
# ... with 508,919 more rows

Variable names

Case Study: Exploratory Data Analysis in R

The pipe operator

1-1.014.png

Case Study: Exploratory Data Analysis in R

The pipe operator

1-1.015.png

Case Study: Exploratory Data Analysis in R

dplyr verbs

1-1.018.png

Case Study: Exploratory Data Analysis in R

dplyr verbs

1-1.020.png

Case Study: Exploratory Data Analysis in R

Original data

votes
# A tibble: 508,929 × 4
    rcid session  vote ccode
   <dbl>   <dbl> <dbl> <int>
1     46       2     1     2
2     46       2     1    20
3     46       2     9    31
4     46       2     1    40
5     46       2     1    41
6     46       2     1    42
7     46       2     9    51
8     46       2     9    52
9     46       2     9    53
10    46       2     9    54
# ... with 508,919 more rows
1 = Yes
2 = Abstain
3 = No
8 = Not present
9 = Not a member
Case Study: Exploratory Data Analysis in R

dplyr verbs: filter

filter keeps observations based on a condition

votes %>%
  filter(vote <= 3)
# A tibble: 353,547 × 4
    rcid session  vote ccode
   <dbl>   <dbl> <dbl> <int>
1     46       2     1     2
2     46       2     1    20
3     46       2     1    40
4     46       2     1    41
5     46       2     1    42
6     46       2     1    70
7     46       2     1    90
8     46       2     1    91
9     46       2     1    92
10    46       2     1    93
# ... with 508,919 more rows
Case Study: Exploratory Data Analysis in R

dplyr verbs: mutate

mutate adds an additional variable

votes %>%
  mutate(year = session + 1945)
# A tibble: 508,929 × 5
    rcid session  vote ccode  year
   <dbl>   <dbl> <dbl> <int> <dbl>
1     46       2     1     2  1947
2     46       2     1    20  1947
3     46       2     9    31  1947
4     46       2     1    40  1947
5     46       2     1    41  1947
6     46       2     1    42  1947
7     46       2     9    51  1947
8     46       2     9    52  1947
9     46       2     9    53  1947
10    46       2     9    54  1947
# ... with 508,919 more rows
Case Study: Exploratory Data Analysis in R

Chaining operations in data cleaning

1-1.033.png

Case Study: Exploratory Data Analysis in R

Let's practice!

Case Study: Exploratory Data Analysis in R

Preparing Video For Download...