How to write regular expressions in Python?

Practicing Coding Interview Questions in Python

Kirill Smirnov

Data Science Consultant, Altran

Definition

Regular expression - a sequence of special characters (metacharacters) defining a pattern to search in a text.

 

cat

"I have a cat. My cat likes to eat a lot. It also catches mice."

Practicing Coding Interview Questions in Python

Definition

Regular expression - a sequence of special characters (metacharacters) defining a pattern to search in a text.

 

cat

"I have a cat. My cat likes to eat a lot. It also catches mice."

Practicing Coding Interview Questions in Python

Complex patterns

Example:

[email protected] is the e-mail of John. He often writes to his boss at [email protected]. But the messages get forwarded to his secretary at [email protected].

Practicing Coding Interview Questions in Python

Complex patterns

Example:

**[email protected] is the e-mail of John. He often writes to his boss at [email protected]. But the messages get forwarded to his secretary at [email protected]**.

Practicing Coding Interview Questions in Python

Special characters

Simple characters and numbers are mapped onto themselves:

  • a $\rightarrow$ a
  • A $\rightarrow$ A
  • 1 $\rightarrow$ 1

Dot maps to anything:

  • . $\rightarrow$ any character

. $\rightarrow$ 'a', '1', '"', ' ', ...

  • \. $\rightarrow$ .
Practicing Coding Interview Questions in Python

Special characters

The following metacharacters represent \ followed by a letter:

  • \w $\rightarrow$ any alphanumeric character or underscore

\w $\rightarrow$ '1', 'a','_', ...

  • \d $\rightarrow$ any digit

\d $\rightarrow$ '1', '2','3', ...

  • \s $\rightarrow$ any whitespace character

\s $\rightarrow$ ' ', '\t', ...

Practicing Coding Interview Questions in Python

Square brackets

Several metacharacters can be enclosed in square brackets:

  • [aAbB] $\rightarrow$ a, A, b, B
  • [a-z] $\rightarrow$ a, b, c, ...
  • [A-Z] $\rightarrow$ A, B, C, ...
  • [0-9] $\rightarrow$ 0, 1, 2, ...
  • [A-Za-z] $\rightarrow$ A, B, C, ..., a, b, c, ...
Practicing Coding Interview Questions in Python

Repetitions

  • * $\rightarrow$ no character or it repeats an undefined number of times

a* $\rightarrow$ '', 'a', 'aa', ...

  • + $\rightarrow$ the character is present at least once

a+ $\rightarrow$ 'a', 'aa', 'aaa', ...

  • ? $\rightarrow$ the character exists or not

a? $\rightarrow$ '', 'a'

  • {n, m} $\rightarrow$ the character is present from n to m times

a{2, 4} $\rightarrow$ 'aa', 'aaa', 'aaaa'

Practicing Coding Interview Questions in Python

Regular expression for an e-mail

Example:

**[email protected] is the e-mail of John. He often writes to his boss at [email protected]. But the messages get forwarded to his secretary at [email protected]**.

[\w\.]+@[a-z]+\.[a-z]+

Practicing Coding Interview Questions in Python

Regular expression for an e-mail

Example:

**[email protected] is the e-mail of John. He often writes to his boss at [email protected]. But the messages get forwarded to his secretary at [email protected]**.

[\w\.]+@[a-z]+\.[a-z]+

 

[\w\.]+ $\rightarrow$ john.smith, boss, info

at least one letter, digit, underscore, or dot character

Practicing Coding Interview Questions in Python

Regular expression for an e-mail

Example:

**[email protected] is the e-mail of John. He often writes to his boss at [email protected]. But the messages get forwarded to his secretary at [email protected]**.

[\w\.]+@[a-z]+\.[a-z]+

 

@ $\rightarrow$ @

Practicing Coding Interview Questions in Python

Regular expression for an e-mail

Example:

**[email protected] is the e-mail of John. He often writes to his boss at [email protected]. But the messages get forwarded to his secretary at [email protected]**.

[\w\.]+@[a-z]+\.[a-z]+

 

[a-z]+ $\rightarrow$ mailbox, company

at least one lowercased letter

Practicing Coding Interview Questions in Python

Regular expression for an e-mail

Example:

**[email protected] is the e-mail of John. He often writes to his boss at [email protected]. But the messages get forwarded to his secretary at [email protected]**.

[\w\.]+@[a-z]+\.[a-z]+

 

\. $\rightarrow$ .

Practicing Coding Interview Questions in Python

Regular expression for an e-mail

Example:

**[email protected] is the e-mail of John. He often writes to his boss at [email protected]. But the messages get forwarded to his secretary at [email protected]**.

[\w\.]+@[a-z]+\.[a-z]+

 

[a-z]+ $\rightarrow$ com

at least one lowercased letter

Practicing Coding Interview Questions in Python

re package

import re

pattern = re.compile(r'[\w\.]+@[a-z]+\.[a-z]+')
text = '[email protected] is the e-mail of '\
'John. He often writes to his boss at '\
'[email protected]. But the messages get forwarded '\
'to his secretary at [email protected].'
Practicing Coding Interview Questions in Python

re.finditer()

result = re.finditer(pattern, text)
print(result)
<callable_iterator object at 0x7f5dff81af98>
for match in result:
    print(match)
<_sre.SRE_Match object; span=(0, 22), match='[email protected]'>
<_sre.SRE_Match object; span=(77, 93), match='[email protected]'>
<_sre.SRE_Match object; span=(146, 162), match='[email protected]'>
Practicing Coding Interview Questions in Python

re.finditer()

result = re.finditer(pattern, text)
print(result)
<callable_iterator object ...>
for match in result:
    print(match.group())
    print(match.start())
    print(match.end())
Practicing Coding Interview Questions in Python

re.findall()

substrings = re.findall(pattern, text)
print(substrings)
['[email protected]', '[email protected]', '[email protected]']
Practicing Coding Interview Questions in Python

re.split()

split_list = re.split(pattern, text)
print(split_list)
['',
 ' is the e-mail of John. He often writes to his boss at ',
 '. But the messages get forwarded to his secretary at ',
 '.']
Practicing Coding Interview Questions in Python

Let's practice!

Practicing Coding Interview Questions in Python

Preparing Video For Download...