Case study introduction

Categorical Data in the Tidyverse

Emily Robinson

Data Scientist

A bar graph titled "hell is other people in a pressurized metal tube" with the caption "percentage of 874 air-passenger respondents who said action is very or somewhat rude." The biggest bar is "knowingly bring unruly children" at 82% and the lowest is "ask to switch seats for family" at 17%. There are 9 bars in total. This survey is from FiveThirtyEight and was done on SurveyMonkey.

Categorical Data in the Tidyverse

Original dataset

# A tibble: 1,040 x 27
  RespondentID travel_amount   do_recline   
         <dbl> <chr>           <chr>        
1  3436139758. Once a year or… NA           
2  3434278696. Once a year or… About half t…
3  3434275578. Once a year or… Usually      
4  3434268208. Once a year or… Always       
# ... with 24 more variables: height <chr>,
#   children_sub_18 <chr>,
#   middle_arm_rest_three <chr>,
#   middle_arm_rest_two <chr>,
#   window_shade_control <chr>,
#   rude_move_seats <chr>, rude_talk <chr>,
#   times_get_up <chr>,
#   recliner_obligation <chr>,
#   rude_recline <chr>,
#   eliminate_recline <chr>,
#   rude_switch_seats_friend <chr>,
Categorical Data in the Tidyverse

Tools recap

wide_data
# A tibble: 2 x 3
  favorite_fruit favorite_vegetable disliked_dessert
  <chr>          <chr>              <chr>           
1 apple          carrot             cookie          
2 orange         cauliflower        cake            
wide_data %>%
   mutate(across(where(is.character), as.factor))
# A tibble: 2 x 3
  favorite_fruit favorite_vegetable disliked_dessert
  <fct>          <fct>              <fct>           
1 apple          carrot             cookie          
2 orange         cauliflower        cake
Categorical Data in the Tidyverse

tidyr pivot_longer()

wide_data %>%
   pivot_longer(everything(), names_to = "column", values_to = "value")
# A tibble: 6 x 2
  column             value      
  <chr>              <chr>      
1 favorite_fruit     apple      
2 favorite_fruit     orange     
3 favorite_vegetable carrot     
4 favorite_vegetable cauliflower
5 disliked_dessert   cookie     
6 disliked_dessert   cake       
Categorical Data in the Tidyverse

Select helper functions

wide_data %>%
   select(contains("favorite"))
# A tibble: 2 x 2
  favorite_fruit favorite_vegetable
  <chr>          <chr>             
1 apple          carrot            
2 orange         cauliflower       
Categorical Data in the Tidyverse

Let's practice!

Categorical Data in the Tidyverse

Preparing Video For Download...