Case Study: Exploratory Data Analysis in R
Dave Robinson
Chief Data Scientist, DataCamp
rcid | session | vote | ccode |
---|---|---|---|
46 | 2 | 1 | 2 |
46 | 2 | 1 | 20 |
46 | 2 | 9 | 31 |
46 | 2 | 1 | 40 |
46 | 2 | 1 | 41 |
46 | 2 | 1 | 42 |
46 | 2 | 1 | 51 |
46 | 2 | 9 | 52 |
46 | 2 | 9 | 53 |
rcid | session | vote | ccode | |
---|---|---|---|---|
46 | 2 | 1 | 2 | Each row has a country-vote pair |
46 | 2 | 1 | 20 | |
46 | 2 | 9 | 31 | |
46 | 2 | 1 | 40 | |
46 | 2 | 1 | 41 | |
46 | 2 | 1 | 42 | |
46 | 2 | 1 | 51 | |
46 | 2 | 9 | 52 | |
46 | 2 | 9 | 53 |
rcid | session | vote | ccode | |
---|---|---|---|---|
46 | 2 | 1 | 2 | Each row has a country-vote pair |
46 | 2 | 1 | 20 | |
46 | 2 | 9 | 31 | rcid = "Roll call ID" |
46 | 2 | 1 | 40 | |
46 | 2 | 1 | 41 | |
46 | 2 | 1 | 42 | |
46 | 2 | 1 | 51 | |
46 | 2 | 9 | 52 | |
46 | 2 | 9 | 53 |
rcid | session | vote | ccode | |
---|---|---|---|---|
46 | 2 | 1 | 2 | Each row has a country-vote pair |
46 | 2 | 1 | 20 | |
46 | 2 | 9 | 31 | rcid = Roll call ID |
46 | 2 | 1 | 40 | |
46 | 2 | 1 | 41 | session = Session year |
46 | 2 | 1 | 42 | |
46 | 2 | 1 | 51 | |
46 | 2 | 9 | 52 | |
46 | 2 | 9 | 53 |
rcid | session | vote | ccode | |
---|---|---|---|---|
46 | 2 | 1 | 2 | Each row has a country-vote pair |
46 | 2 | 1 | 20 | |
46 | 2 | 9 | 31 | rcid = Roll call ID |
46 | 2 | 1 | 40 | |
46 | 2 | 1 | 41 | session = Session year |
46 | 2 | 1 | 42 | |
46 | 2 | 1 | 51 | vote = Vote code |
46 | 2 | 9 | 52 | |
46 | 2 | 9 | 53 |
rcid | session | vote | ccode | |
---|---|---|---|---|
46 | 2 | 1 | 2 | Each row has a country-vote pair |
46 | 2 | 1 | 20 | |
46 | 2 | 9 | 31 | rcid = Roll call ID |
46 | 2 | 1 | 40 | |
46 | 2 | 1 | 41 | session = Session year |
46 | 2 | 1 | 42 | |
46 | 2 | 1 | 51 | vote = Vote code |
46 | 2 | 9 | 52 | |
46 | 2 | 9 | 53 | ccode = Country code |
# Load dplyr package
library(dplyr)
votes
# A tibble: 508,929 × 4
rcid session vote ccode
<dbl> <dbl> <dbl> <int>
1 46 2 1 2
2 46 2 1 20
3 46 2 9 31
4 46 2 1 40
5 46 2 1 41
6 46 2 1 42
7 46 2 9 51
8 46 2 9 52
9 46 2 9 53
10 46 2 9 54
# ... with 508,919 more rows
Variable names
votes
# A tibble: 508,929 × 4
rcid session vote ccode
<dbl> <dbl> <dbl> <int>
1 46 2 1 2
2 46 2 1 20
3 46 2 9 31
4 46 2 1 40
5 46 2 1 41
6 46 2 1 42
7 46 2 9 51
8 46 2 9 52
9 46 2 9 53
10 46 2 9 54
# ... with 508,919 more rows
1 = Yes
2 = Abstain
3 = No
8 = Not present
9 = Not a member
filter
keeps observations based on a condition
votes %>%
filter(vote <= 3)
# A tibble: 353,547 × 4
rcid session vote ccode
<dbl> <dbl> <dbl> <int>
1 46 2 1 2
2 46 2 1 20
3 46 2 1 40
4 46 2 1 41
5 46 2 1 42
6 46 2 1 70
7 46 2 1 90
8 46 2 1 91
9 46 2 1 92
10 46 2 1 93
# ... with 508,919 more rows
mutate
adds an additional variable
votes %>%
mutate(year = session + 1945)
# A tibble: 508,929 × 5
rcid session vote ccode year
<dbl> <dbl> <dbl> <int> <dbl>
1 46 2 1 2 1947
2 46 2 1 20 1947
3 46 2 9 31 1947
4 46 2 1 40 1947
5 46 2 1 41 1947
6 46 2 1 42 1947
7 46 2 9 51 1947
8 46 2 9 52 1947
9 46 2 9 53 1947
10 46 2 9 54 1947
# ... with 508,919 more rows
Case Study: Exploratory Data Analysis in R