Greedy vs. non-greedy matching

Regular Expressions in Python

Maria Eugenia Inzaugarat

Data Scientist

Greedy vs. non-greedy matching

  • Two types of matching methods:

    • Greedy
    • Non-greedy or lazy
  • Standard quantifiers are greedy by default: *, +, ?, {num, num}

Regular Expressions in Python

Greedy matching

  • Greedy: match as many characters as possible

  • Return the longest match

import re
re.match(r"\d+", "12345bcada")
<_sre.SRE_Match object; span=(0, 5), match='12345'>

Regular Expressions in Python

Greedy matching

  • Backtracks when too many character matched

  • Gives up characters one at a time

import re
re.match(r".*hello", "xhelloxxxxxx")
<_sre.SRE_Match object; span=(0, 6), match='xhello'>

Regular Expressions in Python

Non-greedy matching

  • Lazy: match as few characters as needed
  • Returns the shortest match
  • Append ? to greedy quantifiers
import re
re.match(r"\d+?", "12345bcada")
<_sre.SRE_Match object; span=(0, 1), match='1'>

Regular Expressions in Python

Non-greedy matching

  • Backtracks when too few characters matched

  • Expands characters one a time

import re
re.match(r".*?hello", "xhelloxxxxxx")
<_sre.SRE_Match object; span=(0, 6), match='xhello'>

Regular Expressions in Python

Let's practice!

Regular Expressions in Python

Preparing Video For Download...