Regex metacharacters

Regular Expressions in Python

Maria Eugenia Inzaugarat

Data Scientist

Looking for patterns

Two different operations to find a match:

re.search(r"\d{4}", "4506 people attend the show")
<_sre.SRE_Match object; span=(0, 4), match='4506'>

 

re.search(r"\d+", "Yesterday, I saw 3 shows")
<_sre.SRE_Match object; span=(17, 18), match='3'>

 

re.match(r"\d{4}", "4506 people attend the show")
<_sre.SRE_Match object; span=(0, 4), match='4506'>

 

re.match(r"\d+","Yesterday, I saw 3 shows")
None
Regular Expressions in Python

Special characters

  • Match any character (except newline): .

 

my_links = "Just check out this link: www.amazingpics.com. It has amazing photos!"

re.findall(r"www com", my_links)
Regular Expressions in Python

Special characters

  • Match any character (except newline): .

 

my_links = "Just check out this link: www.amazingpics.com. It has amazing photos!"

re.findall(r"www.+com", my_links)
['www.amazingpics.com']
Regular Expressions in Python

Special characters

  • Start of the string: ^
my_string = "the 80s music was much better that the 90s"
re.findall(r"the\s\d+s", my_string)
['the 80s', 'the 90s']

 

re.findall(r"^the\s\d+s", my_string)
['the 80s']
Regular Expressions in Python

Special characters

  • End of the string: $
my_string = "the 80s music hits were much better that the 90s"
re.findall(r"the\s\d+s$", my_string)
['the 90s']
Regular Expressions in Python

Special characters

  • Escape special characters: \
my_string = "I love the music of Mr.Go. However, the sound was too loud."
print(re.split(r".\s", my_string))
['', 'lov', 'th', 'musi', 'o', 'Mr.Go', 'However', 'th', 'soun', 'wa', 'to', 'loud.']

 

print(re.split(r"\.\s", my_string))
['I love the music of Mr.Go', 'However, the sound was too loud.']
Regular Expressions in Python

OR operator

  • Character: |
my_string = "Elephants are the world's largest land animal! I would love to see an elephant one day"
re.findall(r"Elephant|elephant", my_string)
['Elephant', 'elephant']
Regular Expressions in Python

OR operator

  • Set of characters: [ ]
my_string = "Yesterday I spent my afternoon with my friends: MaryJohn2 Clary3"
re.findall(r"[a-zA-Z]+\d", my_string)
['MaryJohn2', 'Clary3']
Regular Expressions in Python

OR operator

  • Set of characters: [ ]
my_string = "My&name&is#John Smith. I%live$in#London."
re.sub(r"[#$%&]", " ", my_string)
'My name is John Smith. I live in London.'
Regular Expressions in Python

OR operand

  • Set of characters: [ ]
    • ^ transforms the expression to negative

 

my_links = "Bad website: www.99.com. Favorite site: www.hola.com"
re.findall(r"www[^0-9]+com", my_links)
['www.hola.com']
Regular Expressions in Python

Let's practice!

Regular Expressions in Python

Preparing Video For Download...