Practicing Coding Interview Questions in Python
Kirill Smirnov
Data Science Consultant, Altran
Regular expression - a sequence of special characters (metacharacters) defining a pattern to search in a text.
cat
"I have a cat. My cat likes to eat a lot. It also catches mice."
Regular expression - a sequence of special characters (metacharacters) defining a pattern to search in a text.
cat
"I have a cat. My cat likes to eat a lot. It also catches mice."
Example:
[email protected] is the e-mail of John. He often writes to his boss at [email protected]. But the messages get forwarded to his secretary at [email protected].
Example:
**[email protected] is the e-mail of John. He often writes to his boss at [email protected]. But the messages get forwarded to his secretary at [email protected]**.
Simple characters and numbers are mapped onto themselves:
a
$\rightarrow$ a
A
$\rightarrow$ A
1
$\rightarrow$ 1
Dot maps to anything:
.
$\rightarrow$ any character.
$\rightarrow$ 'a'
, '1'
, '"'
, ' '
, ...
\.
$\rightarrow$ .
The following metacharacters represent \
followed by a letter:
\w
$\rightarrow$ any alphanumeric character or underscore\w
$\rightarrow$ '1'
, 'a'
,'_'
, ...
\d
$\rightarrow$ any digit\d
$\rightarrow$ '1'
, '2'
,'3'
, ...
\s
$\rightarrow$ any whitespace character\s
$\rightarrow$ ' '
, '\t'
, ...
Several metacharacters can be enclosed in square brackets:
[aAbB]
$\rightarrow$ a
, A
, b
, B
[a-z]
$\rightarrow$ a
, b
, c
, ...[A-Z]
$\rightarrow$ A
, B
, C
, ...[0-9]
$\rightarrow$ 0
, 1
, 2
, ...[A-Za-z]
$\rightarrow$ A
, B
, C
, ..., a
, b
, c
, ...*
$\rightarrow$ no character or it repeats an undefined number of timesa*
$\rightarrow$ ''
, 'a'
, 'aa'
, ...
+
$\rightarrow$ the character is present at least oncea+
$\rightarrow$ 'a'
, 'aa'
, 'aaa'
, ...
?
$\rightarrow$ the character exists or nota?
$\rightarrow$ ''
, 'a'
{n, m}
$\rightarrow$ the character is present from n to m timesa{2, 4}
$\rightarrow$ 'aa'
, 'aaa'
, 'aaaa'
Example:
**[email protected] is the e-mail of John. He often writes to his boss at [email protected]. But the messages get forwarded to his secretary at [email protected]**.
[\w\.]+@[a-z]+\.[a-z]+
Example:
**[email protected] is the e-mail of John. He often writes to his boss at [email protected]. But the messages get forwarded to his secretary at [email protected]**.
[\w\.]+
@[a-z]+\.[a-z]+
[\w\.]+
$\rightarrow$ john.smith
, boss
, info
at least one letter, digit, underscore, or dot character
Example:
**[email protected] is the e-mail of John. He often writes to his boss at [email protected]. But the messages get forwarded to his secretary at [email protected]**.
[\w\.]+
@
[a-z]+\.[a-z]+
@ $\rightarrow$ @
Example:
**[email protected] is the e-mail of John. He often writes to his boss at [email protected]. But the messages get forwarded to his secretary at [email protected]**.
[\w\.]+@
[a-z]+
\.[a-z]+
[a-z]+
$\rightarrow$ mailbox
, company
at least one lowercased letter
Example:
**[email protected] is the e-mail of John. He often writes to his boss at [email protected]. But the messages get forwarded to his secretary at [email protected]**.
[\w\.]+@[a-z]+
\.
[a-z]+
\.
$\rightarrow$ .
Example:
**[email protected] is the e-mail of John. He often writes to his boss at [email protected]. But the messages get forwarded to his secretary at [email protected]**.
[\w\.]+@[a-z]+\.
[a-z]+
[a-z]+
$\rightarrow$ com
at least one lowercased letter
import re
pattern = re.compile(r'[\w\.]+@[a-z]+\.[a-z]+')
text = '[email protected] is the e-mail of '\
'John. He often writes to his boss at '\
'[email protected]. But the messages get forwarded '\
'to his secretary at [email protected].'
result = re.finditer(pattern, text)
print(result)
<callable_iterator object at 0x7f5dff81af98>
for match in result:
print(match)
<_sre.SRE_Match object; span=(0, 22), match='[email protected]'>
<_sre.SRE_Match object; span=(77, 93), match='[email protected]'>
<_sre.SRE_Match object; span=(146, 162), match='[email protected]'>
result = re.finditer(pattern, text)
print(result)
<callable_iterator object ...>
for match in result:
print(match.group())
print(match.start())
print(match.end())
[email protected]
0
22
[email protected]
77
93
[email protected]
146
162
substrings = re.findall(pattern, text)
print(substrings)
['[email protected]', '[email protected]', '[email protected]']
split_list = re.split(pattern, text)
print(split_list)
['',
' is the e-mail of John. He often writes to his boss at ',
'. But the messages get forwarded to his secretary at ',
'.']
Practicing Coding Interview Questions in Python