Introduction to Natural Language Processing in Python
Katharine Jarmul
Founder, kjamistan
→ Find all web links in a document
→ Parse email addresses
→ Remove/replace unwanted characters
import re
re.match('abc', 'abcdef')
word_regex = '\w+'
re.match(word_regex,
'hi there!')
<_sre.SRE_Match object; span=(0, 3), match='abc'>
<_sre.SRE_Match object; span=(0, 2), match='hi'>
pattern | matches | example |
---|---|---|
\w+ | word | 'Magic' |
pattern | matches | example |
---|---|---|
\w+ | word | 'Magic' |
\d | digit | 9 |
pattern | matches | example |
---|---|---|
\w+ | word | 'Magic' |
\d | digit | 9 |
\s | space | ' ' |
pattern | matches | example |
---|---|---|
\w+ | word | 'Magic' |
\d | digit | 9 |
\s | space | ' ' |
.* | wildcard | 'username74' |
pattern | matches | example |
---|---|---|
\w+ | word | 'Magic' |
\d | digit | 9 |
\s | space | ' ' |
.* | wildcard | 'username74' |
+ or * | greedy match | 'aaaaaa' |
pattern | matches | example |
---|---|---|
\w+ | word | 'Magic' |
\d | digit | 9 |
\s | space | ' ' |
.* | wildcard | 'username74' |
+ or * | greedy match | 'aaaaaa' |
\S | not space | 'no_spaces' |
pattern | matches | example |
---|---|---|
\w+ | word | 'Magic' |
\d | digit | 9 |
\s | space | ' ' |
.* | wildcard | 'username74' |
+ or * | greedy match | 'aaaaaa' |
\S | not space | 'no_spaces' |
[a-z] | lowercase group | 'abcdefg' |
re
modulesplit
: split a string on regexfindall
: find all patterns in a stringsearch
: search for a patternmatch
: match an entire string or substring based on a patternre.split('\s+', 'Split on spaces.')
['Split', 'on', 'spaces.']
Introduction to Natural Language Processing in Python