Categorical Data in the Tidyverse
Emily Robinson
Data Scientist
WorkChallengeFrequencyExplaining WorkChallengeFrequencyIntegration
<chr> <chr>
1 Often Often
2 Most of the time Most of the time
work_challenge frequency
<chr> <chr>
1 Explaining Often
2 Explaining Most of the time
3 Integration Often
4 Integration Most of the time
multipleChoiceResponses %>%
select(contains("WorkChallengeFrequency")) %>%
pivot_longer(everything(), names_to = "work_challenge", values_to = "frequency")
# A tibble: 367,752 x 2
work_challenge frequency
<chr> <chr>
1 WorkChallengeFrequencyPolitics Rarely
2 WorkChallengeFrequencyPolitics NA
3 WorkChallengeFrequencyPolitics NA
4 WorkChallengeFrequencyPolitics Often
5 WorkChallengeFrequencyPolitics Often
6 WorkChallengeFrequencyPolitics NA
7 WorkChallengeFrequencyPolitics NA
8 WorkChallengeFrequencyPolitics NA
work_challenges <- multipleChoiceResponses %>%
select(contains("WorkChallengeFrequency")) %>%
pivot_longer(everything(), names_to = "work_challenge", values_to = "frequency") %>%
mutate(work_challenge = str_remove(work_challenge,
"WorkChallengeFrequency"))
# A tibble: 367,752 x 2
work_challenge frequency
<chr> <chr>
1 Politics Rarely
2 Politics NA
3 Politics NA
4 Politics Often
5 Politics Often
6 Politics NA
work_challenges %>%
filter(!is.na(frequency)) %>%
mutate(frequency = if_else(
frequency %in% c("Most of the time", "Often"),
1, 0)) %>%
group_by(work_challenge) %>%
summarize(perc_problem = mean(frequency))
# A tibble: 22 x 2
work_challenge perc_problem
<chr> <dbl>
1 Clarity 0.0930
2 DataAccess 0.0923
3 DataFunds 0.0367
4 Deployment 0.0265
5 DirtyData 0.176
6 DomainExpertise 0.0573
Categorical Data in the Tidyverse