Intermediate Regular Expressions in R
Angelo Zehr
Data Journalist
Does "cat"
start with "c"
?
The rebus way:
str_detect("cat", pattern = START %R% "c")
Regular expression:
str_detect("cat", pattern = "^c")
str_detect(string, pattern)
str_match(string, pattern)
movie_titles <- c(
"Karate Kid",
"The Twilight Saga: Eclispe",
"Knight & Day",
"Shrek Forever After (3D)",
"Marmaduke.",
"Predators",
"StreetDance (3D)",
"Robin Hood",
"Micmacs A Tire-Larigot",
"Sex And the City 2",
...
movie_titles[
str_detect(
movie_titles,
pattern = "^K"
)
]
"Karate Kid",
"Knight & Day",
...
Special character | Meaning |
---|---|
^ |
Caret: Marks the beginning of a line or string |
$ |
Dollar Sign: Marks the end of a line or string |
. |
Period: Matches anything: letters, numbers or white spaces |
\\. |
Two backslashes: Escapes the period when we search an actual period |
Code | Result |
---|---|
str_match("Book", "^.") |
Will match "B" |
str_match("Book", ".$") |
Will match "k" |
str_match("Book", "\\.") |
No match |
str_match("Book.", "\\.") |
Will match "." |
Intermediate Regular Expressions in R