Introduction to Natural Language Processing in Python
Katharine Jarmul
Founder, kjamistan
→ Find all web links in a document
→ Parse email addresses
→ Remove/replace unwanted characters
import rere.match('abc', 'abcdef')
word_regex = '\w+'
re.match(word_regex,
'hi there!')
<_sre.SRE_Match object; span=(0, 3), match='abc'>
<_sre.SRE_Match object; span=(0, 2), match='hi'>
| pattern | matches | example |
|---|---|---|
| \w+ | word | 'Magic' |
| pattern | matches | example |
|---|---|---|
| \w+ | word | 'Magic' |
| \d | digit | 9 |
| pattern | matches | example |
|---|---|---|
| \w+ | word | 'Magic' |
| \d | digit | 9 |
| \s | space | ' ' |
| pattern | matches | example |
|---|---|---|
| \w+ | word | 'Magic' |
| \d | digit | 9 |
| \s | space | ' ' |
| .* | wildcard | 'username74' |
| pattern | matches | example |
|---|---|---|
| \w+ | word | 'Magic' |
| \d | digit | 9 |
| \s | space | ' ' |
| .* | wildcard | 'username74' |
| + or * | greedy match | 'aaaaaa' |
| pattern | matches | example |
|---|---|---|
| \w+ | word | 'Magic' |
| \d | digit | 9 |
| \s | space | ' ' |
| .* | wildcard | 'username74' |
| + or * | greedy match | 'aaaaaa' |
| \S | not space | 'no_spaces' |
| pattern | matches | example |
|---|---|---|
| \w+ | word | 'Magic' |
| \d | digit | 9 |
| \s | space | ' ' |
| .* | wildcard | 'username74' |
| + or * | greedy match | 'aaaaaa' |
| \S | not space | 'no_spaces' |
| [a-z] | lowercase group | 'abcdefg' |
re modulesplit: split a string on regexfindall: find all patterns in a stringsearch: search for a patternmatch: match an entire string or substring based on a patternre.split('\s+', 'Split on spaces.')
['Split', 'on', 'spaces.']
Introduction to Natural Language Processing in Python