Intermediate Regular Expressions in R
Angelo Zehr
Data Journalist
| Character Class | Example |
|---|---|
\\d or [:digit:] |
0, 1, 2, 3,… |
\\w or [:word:] |
a, b, c…, 1, 2, 3…, _ |
[A-Za-z] or [:alpha:] |
A, B, C,…, a, b, c,… |
[aeiou] |
either a, e, i, o or u |
\\s or [:space:] |
" ", tabs or line breaks |
str_match_all() |
Result |
|---|---|
"Hi John_35", "\\d" |
"3", "5" |
"Hi John_35", "\\w" |
"H", "i", "J", "o", "h", "n", "_", "3", "5" |
"Hi John_35", "[A-Za-z]" |
"H", "i", "J", "o", "h", "n" |
"Hi John_35", "[aeiou]" |
"i", "o" |
"Hi John_35", "\\s" |
" " |
| Syntax | Meaning |
|---|---|
\\w{2} |
exactly 2 times |
\\w{2,3} |
minimum 2 times, maximum 3 times |
\\w{2,} |
minimum 2 times, but no maximum |
\\w+ |
1 or more repetitions |
\\w* |
0, 1 or more repetitions |
| Original | Negation |
|---|---|
\\d match digits |
\\D match all but digits |
\\w match word characters |
\\W match all but word characters |
\\s match spaces |
\\S match all but spaces |
[a-zA-Z] match alphabet |
[^a-zA-Z] match all but alphabet |
str_match_all("Toy Story 3", "[\\d\\s]")
Result:
[,1]
[1,] " "
[2,] " "
[3,] "3"
Intermediate Regular Expressions in R