tidyr's extract

Intermediate Regular Expressions in R

Angelo Zehr

Data Journalist

Functions used so far

  • str_match
  • str_replace
  • str_match_all
  • str_replace_all
  • ...
Intermediate Regular Expressions in R

Where regular expressions and data frames meet:

extract(
    data,
    col,
    into,
    regex = "([[:alnum:]]+)",
    remove = TRUE,
    convert = FALSE,
    ...
 )
Intermediate Regular Expressions in R

The arguments of extract

extract(
    data,
    col,
    into,
    regex = "([[:alnum:]]+)",
    remove = TRUE,
    convert = FALSE,
    ...
 )
  • data

  • col

  • into

  • regex

  • remove

  • convert

Intermediate Regular Expressions in R

Movies data frame

Intermediate Regular Expressions in R

What we can do with str_match

screenshot of a table

screens_per_movie %<>%
  mutate(
    is_3d = str_match(line, "3D")
  )
Intermediate Regular Expressions in R

What the result of str_match looks like

screenshot of a table

screens_per_movie %<>%
  mutate(
    is_3d = str_match(line, "3D")
  )
Intermediate Regular Expressions in R

str_match can only match one information

screenshot of a table

Intermediate Regular Expressions in R

This is what extract can do for us

Intermediate Regular Expressions in R

This is what extract can do for us

screenshot of a table

extract(
  screens_per_movie,
  col = "line",
  into = c("is_3d", "screens"),
  regex = "(3D).*?(\\d+)$",
  remove = FALSE
 )
Intermediate Regular Expressions in R

The result of extract

screenshot of a table

extract(
  screens_per_movie,
  col = "line",
  into = c("is_3d", "screens"),
  regex = "(3D).*?(\\d+)$",
  remove = FALSE
)
Intermediate Regular Expressions in R

Let's practice!

Intermediate Regular Expressions in R

Preparing Video For Download...