Welcome

Intermediate Regular Expressions in R

Angelo Zehr

Data Journalist

Where you might have left off

Intermediate Regular Expressions in R

From Rebus to writing custom expressions

Does "cat" start with "c"?

The rebus way:

str_detect("cat", pattern = START %R% "c")

Regular expression:

str_detect("cat", pattern = "^c")
Intermediate Regular Expressions in R

Prerequisites: stringr

str_detect(string, pattern)
str_match(string, pattern)
Intermediate Regular Expressions in R

What regular expressions will help you achieve

blob of text with highlight

Intermediate Regular Expressions in R

What regular expressions will help you achieve

blob of text with highlights

Intermediate Regular Expressions in R

Our first dataset

movie_titles <- c(
  "Karate Kid",
  "The Twilight Saga: Eclispe",
  "Knight & Day",
  "Shrek Forever After (3D)",
  "Marmaduke.",
  "Predators",
  "StreetDance (3D)",
  "Robin Hood",
  "Micmacs A Tire-Larigot",
  "Sex And the City 2",
...
movie_titles[
  str_detect(
    movie_titles,
    pattern = "^K"
  )
]
"Karate Kid",
"Knight & Day",
...
Intermediate Regular Expressions in R

Special characters in regular expressions

Special character Meaning
^ Caret: Marks the beginning of a line or string
$ Dollar Sign: Marks the end of a line or string
. Period: Matches anything: letters, numbers or white spaces
\\. Two backslashes: Escapes the period when we search an actual period
Intermediate Regular Expressions in R

For example

Code Result
str_match("Book", "^.") Will match "B"
str_match("Book", ".$") Will match "k"
str_match("Book", "\\.") No match
str_match("Book.", "\\.") Will match "."
Intermediate Regular Expressions in R

Let's practice!

Intermediate Regular Expressions in R

Preparing Video For Download...