Character classes and repetitions

Intermediate Regular Expressions in R

Angelo Zehr

Data Journalist

Available character classes

Character Class Example
\\d or [:digit:] 0, 1, 2, 3,…
\\w or [:word:] a, b, c…, 1, 2, 3…, _
[A-Za-z] or [:alpha:] A, B, C,…, a, b, c,…
[aeiou] either a, e, i, o or u
\\s or [:space:] " ", tabs or line breaks
Intermediate Regular Expressions in R

A concrete example

str_match_all() Result
"Hi John_35", "\\d" "3", "5"
"Hi John_35", "\\w" "H", "i", "J", "o", "h", "n", "_", "3", "5"
"Hi John_35", "[A-Za-z]" "H", "i", "J", "o", "h", "n"
"Hi John_35", "[aeiou]" "i", "o"
"Hi John_35", "\\s" " "
Intermediate Regular Expressions in R

Repetitions

Syntax Meaning
\\w{2} exactly 2 times
\\w{2,3} minimum 2 times, maximum 3 times
\\w{2,} minimum 2 times, but no maximum
\\w+ 1 or more repetitions
\\w* 0, 1 or more repetitions
Intermediate Regular Expressions in R

Inversion of character classes

Original Negation
\\d match digits \\D match all but digits
\\w match word characters \\W match all but word characters
\\s match spaces \\S match all but spaces
[a-zA-Z] match alphabet [^a-zA-Z] match all but alphabet
Intermediate Regular Expressions in R

Custom pattern with classes

str_match_all("Toy Story 3", "[\\d\\s]")

Result:

     [,1]
[1,] " " 
[2,] " " 
[3,] "3"
Intermediate Regular Expressions in R

Let's practice!

Intermediate Regular Expressions in R

Preparing Video For Download...