Grouping and capturing

Regular Expressions in Python

Maria Eugenia Inzaugarat

Data Scientist

Group characters

 

Regular Expressions in Python

Group characters

 

 

re.findall(r'[A-Za-z]+\s\w+\s\d+\s\w+', text)
['Clary has 2 friends', 'Susan has 3 brothers', 'John has 4 sisters']
Regular Expressions in Python

Capturing groups

 

  • Use parentheses to group and capture characters together

 

Regular Expressions in Python

Capturing groups

 

  • Use parentheses to group and capture characters together

 

re.findall(r'([A-Za-z]+)\s\w+\s\d+\s\w+', text)
['Clary', 'Susan', 'John']
Regular Expressions in Python

Capturing groups

 

 

Regular Expressions in Python

Capturing groups

 

 

re.findall(r'([A-Za-z]+)\s\w+\s(\d+)\s(\w+)', text)
[('Clary', '2', 'friends'),
 ('Susan', '3', 'brothers'),
 ('John', '4', 'sisters')]
Regular Expressions in Python

Capturing groups

  • Match a specific subpattern in a pattern
  • Use it for further processing
Regular Expressions in Python

Capturing groups

  • Organize the data
pets = re.findall(r'([A-Za-z]+)\s\w+\s(\d+)\s(\w+)', "Clary has 2 dogs but John has 3 cats")

pets[0][0]
'Clary'
Regular Expressions in Python

Capturing groups

  • Immediately to the left

    • r"apple+": + applies to e and not to apple

 

  • Apply a quantifier to the entire group
re.search(r"(\d[A-Za-z])+", "My user name is 3e4r5fg")
<_sre.SRE_Match object; span=(16, 22), match='3e4r5f'>
Regular Expressions in Python

Capturing groups

  • Capture a repeated group (\d+) vs. repeat a capturing group (\d)+
my_string = "My lucky numbers are 8755 and 33"
re.findall(r"(\d)+", my_string)
['5', '3']

 

re.findall(r"(\d+)", my_string)
['8755', '33']
Regular Expressions in Python

Let's practice!

Regular Expressions in Python

Preparing Video For Download...