Data preparation and regex

Categorical Data in the Tidyverse

Emily Robinson

Data Scientist

Handling long names

gathered_data %>%
   distinct(response_var)
# A tibble: 9 x 1
  response_var          
  <chr>
1 Is it rude to move to an unsold seat on a 
  plane?
2 Generally speaking, is it rude to say 
  more than a few words to the stranger…
3 Is it rude to recline your seat on a plane?
4 Is it rude to ask someone to switch 
  seats with you in order to be closer to…
5 Is it rude to ask someone to switch 
  seats with you in order to be closer to…
6 Is it rude to wake a passenger up if 
  you are trying to go to the bathroom?  
7 Is it rude to wake a passenger up if 
  you are trying to walk around?          
8 In general, is it rude to bring a 
  baby on a plane?                           
9 In general, is it rude to knowingly 
  bring unruly children on a plane?       
Categorical Data in the Tidyverse

Regex

str_detect("happy", ".")
[1] TRUE
str_detect("happy", "h.")
[1] TRUE
str_detect("happy", "y.")
[1] FALSE
Categorical Data in the Tidyverse

Regex

string <- "Statistics is the best"
str_remove(string, ".*the ")
[1] "best"
Categorical Data in the Tidyverse

Let's practice!

Categorical Data in the Tidyverse

Preparing Video For Download...