Practicing Coding Interview Questions in Python
Kirill Smirnov
Data Science Consultant, Altran
Regular expression - a sequence of special characters (metacharacters) defining a pattern to search in a text.
cat
"I have a cat. My cat likes to eat a lot. It also catches mice."
Regular expression - a sequence of special characters (metacharacters) defining a pattern to search in a text.
cat
"I have a cat. My cat likes to eat a lot. It also catches mice."
Example:
[email protected] is the e-mail of John. He often writes to his boss at [email protected]. But the messages get forwarded to his secretary at [email protected].
Example:
**[email protected] is the e-mail of John. He often writes to his boss at [email protected]. But the messages get forwarded to his secretary at [email protected]**.
Simple characters and numbers are mapped onto themselves:
a $\rightarrow$ aA $\rightarrow$ A1 $\rightarrow$ 1Dot maps to anything:
. $\rightarrow$ any character. $\rightarrow$ 'a', '1', '"', ' ', ...
\. $\rightarrow$ .The following metacharacters represent \ followed by a letter:
\w $\rightarrow$ any alphanumeric character or underscore\w $\rightarrow$ '1', 'a','_', ...
\d $\rightarrow$ any digit\d $\rightarrow$ '1', '2','3', ...
\s $\rightarrow$ any whitespace character\s $\rightarrow$ ' ', '\t', ...
Several metacharacters can be enclosed in square brackets:
[aAbB] $\rightarrow$ a, A, b, B[a-z] $\rightarrow$ a, b, c, ...[A-Z] $\rightarrow$ A, B, C, ...[0-9] $\rightarrow$ 0, 1, 2, ...[A-Za-z] $\rightarrow$ A, B, C, ..., a, b, c, ...* $\rightarrow$ no character or it repeats an undefined number of timesa* $\rightarrow$ '', 'a', 'aa', ...
+ $\rightarrow$ the character is present at least oncea+ $\rightarrow$ 'a', 'aa', 'aaa', ...
? $\rightarrow$ the character exists or nota? $\rightarrow$ '', 'a'
{n, m} $\rightarrow$ the character is present from n to m timesa{2, 4} $\rightarrow$ 'aa', 'aaa', 'aaaa'
Example:
**[email protected] is the e-mail of John. He often writes to his boss at [email protected]. But the messages get forwarded to his secretary at [email protected]**.
[\w\.]+@[a-z]+\.[a-z]+
Example:
**[email protected] is the e-mail of John. He often writes to his boss at [email protected]. But the messages get forwarded to his secretary at [email protected]**.
[\w\.]+@[a-z]+\.[a-z]+
[\w\.]+ $\rightarrow$ john.smith, boss, info
at least one letter, digit, underscore, or dot character
Example:
**[email protected] is the e-mail of John. He often writes to his boss at [email protected]. But the messages get forwarded to his secretary at [email protected]**.
[\w\.]+@[a-z]+\.[a-z]+
@ $\rightarrow$ @
Example:
**[email protected] is the e-mail of John. He often writes to his boss at [email protected]. But the messages get forwarded to his secretary at [email protected]**.
[\w\.]+@[a-z]+\.[a-z]+
[a-z]+ $\rightarrow$ mailbox, company
at least one lowercased letter
Example:
**[email protected] is the e-mail of John. He often writes to his boss at [email protected]. But the messages get forwarded to his secretary at [email protected]**.
[\w\.]+@[a-z]+\.[a-z]+
\. $\rightarrow$ .
Example:
**[email protected] is the e-mail of John. He often writes to his boss at [email protected]. But the messages get forwarded to his secretary at [email protected]**.
[\w\.]+@[a-z]+\.[a-z]+
[a-z]+ $\rightarrow$ com
at least one lowercased letter
import repattern = re.compile(r'[\w\.]+@[a-z]+\.[a-z]+')
text = '[email protected] is the e-mail of '\
'John. He often writes to his boss at '\
'[email protected]. But the messages get forwarded '\
'to his secretary at [email protected].'
result = re.finditer(pattern, text)
print(result)
<callable_iterator object at 0x7f5dff81af98>
for match in result:
print(match)
<_sre.SRE_Match object; span=(0, 22), match='[email protected]'>
<_sre.SRE_Match object; span=(77, 93), match='[email protected]'>
<_sre.SRE_Match object; span=(146, 162), match='[email protected]'>
result = re.finditer(pattern, text)
print(result)
<callable_iterator object ...>
for match in result:
print(match.group())
print(match.start())
print(match.end())
[email protected]
0
22
[email protected]
77
93
[email protected]
146
162
substrings = re.findall(pattern, text)
print(substrings)
['[email protected]', '[email protected]', '[email protected]']
split_list = re.split(pattern, text)
print(split_list)
['',
' is the e-mail of John. He often writes to his boss at ',
'. But the messages get forwarded to his secretary at ',
'.']
Practicing Coding Interview Questions in Python