Categorical Data in the Tidyverse
Emily Robinson
Data Scientist
# A tibble: 16,716 x 228
GenderSelect Country Age EmploymentStatus
<chr> <chr> <int> <chr>
1 Non-binary, gender... NA NA Employed full-time
2 Female United ... 30 Not employed, but lo...
3 Male Canada 28 Not employed, but lo…
4 Male United ... 56 Independent contract...
5 Male Taiwan 38 Employed full-time
6 Male Brazil 46 Employed full-time
7 Male United ... 35 Employed full-time
8 Female India 22 Employed full-time
9 Female Austral... 43 Employed full-time
10 Male Russia 33 Employed full-time
# ... with 16,706 more rows, and 224 more variables:
# StudentStatus <chr>, LearningDataScience <chr>,
# CodeWriter <chr>, CareerSwitcher <chr>, ...
is.character(multipleChoiceResponses$LearningDataScienceTime)
TRUE
multipleChoiceResponses %>%
mutate(across(where(is.character, as.factor))
# A tibble: 16,716 x 228
GenderSelect Country Age EmploymentStatus
<fct> <fct> <int> <fct>
1 Non-binary, gender NA NA Employed full-time
2 Female United ... 30 Not employed, but lo...
3 Male Canada 28 Not employed, but lo...
4 Male United ... 56 Independent contract...
# ... with 16,710 more rows, and 224 more variables:
# StudentStatus <fct>, LearningDataScience <fct>,
# CodeWriter <fct>, CareerSwitcher <fct>, ...
nlevels()
)nlevels(multipleChoiceResponses$LearningDataScienceTime)
6
levels()
)levels(multipleChoiceResponses$LearningDataScienceTime)
[1] "< 1 year" "1-2 years" "10-15 years" "15+ years"
[5] "3-5 years" "5-10 years"
multipleChoiceResponses %>%
summarize(across(where(is.factor), nlevels)
# A tibble: 1 x 215
GenderSelect Country EmploymentStatus StudentStatus
<int> <int> <int> <int>
1 4 52 7 2
# ... with 211 more variables: LearningDataScience <int>,
# CodeWriter <int>, CareerSwitcher <int>,
multipleChoiceResponses %>%
select(everything())
multipleChoiceResponses %>%
pivot_longer(everything())
Categorical Data in the Tidyverse