Regular Expressions in Python
Maria Eugenia Inzaugarat
Data Scientist
Two types of matching methods:
Standard quantifiers are greedy by default: *
, +
, ?
, {num, num}
Greedy: match as many characters as possible
Return the longest match
import re
re.match(r"\d+", "12345bcada")
<_sre.SRE_Match object; span=(0, 5), match='12345'>
Backtracks when too many character matched
Gives up characters one at a time
import re
re.match(r".*hello", "xhelloxxxxxx")
<_sre.SRE_Match object; span=(0, 6), match='xhello'>
?
to greedy quantifiers import re
re.match(r"\d+?", "12345bcada")
<_sre.SRE_Match object; span=(0, 1), match='1'>
Backtracks when too few characters matched
Expands characters one a time
import re
re.match(r".*?hello", "xhelloxxxxxx")
<_sre.SRE_Match object; span=(0, 6), match='xhello'>
Regular Expressions in Python