Introduction to regular expressions

Regular Expressions in Python

Maria Eugenia Inzaugarat

Data Scientist

What is a regular expression?

REGular EXpression or regex:

String containing a combination of normal characters and special metacharacters that describes patterns to find text or positions within a text

Regular Expressions in Python

What is a regular expression?

REGular EXpression or regex:

String containing a combination of normal characters and special metacharacters that describes patterns to find text or positions within a text

  • Normal characters match themselves (st)
Regular Expressions in Python

What is a regular expression?

REGular EXpression or regex:

String containing a combination of normal characters and special metacharacters that describes patterns to find text or positions within a text

  • Metacharacters represent types of characters (\d, \s, \w) or ideas ({3,10})
Regular Expressions in Python

What is a regular expression?

REGular EXpression or regex:

String containing a combination of normal characters and special metacharacters that describes patterns to find text or positions within a text

  • Metacharacters represent types of characters (\d, \s, \w) or ideas ({3,10})
Regular Expressions in Python

What is a regular expression?

REGular EXpression or regex:

String containing a combination of normal characters and special metacharacters that describes patterns to find text or positions within a text

  • Metacharacters represent types of characters (\d, \s, \w) or ideas ({3,10})
Regular Expressions in Python

What is a regular expression?

REGular EXpression or regex:

String containing a combination of normal characters and special metacharacters that describes patterns to find text or positions within a text

  • Metacharacters represent types of characters (\d, \s, \w) or ideas ({3,10})
Regular Expressions in Python

What is a regular expression?

REGular EXpression or regex:

String containing a combination of normal characters and special metacharacters that describes patterns to find text or positions within a text

  • Pattern: a sequence of characters that maps to words or punctuation
Regular Expressions in Python

What is a regular expression?

REGular EXpression or regex:

String containing a combination of normal characters and special metacharacters that describes patterns to find text or positions within a text

  • Pattern matching usage:

    • Find and replace text
    • Validate strings
  • Very powerful and fast

Regular Expressions in Python

The re module

import re
  • Find all matches of a pattern:

re.findall(r"#movies", "Love #movies! I had fun yesterday going to the #movies")
['#movies', '#movies']
Regular Expressions in Python

The re module

import re
  • Split string at each match:

re.split(r"!", "Nice Place to eat! I'll come back! Excellent meat!")
['Nice Place to eat', " I'll come back", ' Excellent meat', '']
Regular Expressions in Python

The re module

import re
  • Replace one or many matches with a string:

re.sub(r"yellow", "nice", "I have a yellow car and a yellow house in a yellow neighborhood")
'I have a nice car and a nice house in a nice neighborhood'
Regular Expressions in Python

Supported metacharacters

re.findall(r"User\d", "The winners are: User9, UserN, User8")
['User9', 'User8']

re.findall(r"User\D", "The winners are: User9, UserN, User8")
['UserN']
Regular Expressions in Python

Supported metacharacters

re.findall(r"User\w", "The winners are: User9, UserN, User8")
['User9', 'UserN', 'User8']

re.findall(r"\W\d", "This skirt is on sale, only $5 today!")
['$5']
Regular Expressions in Python

Supported metacharacters

re.findall(r"Data\sScience", "I enjoy learning Data Science")
['Data Science']

re.sub(r"ice\Scream", "ice cream", "I really like ice-cream")
'I really like ice cream'
Regular Expressions in Python

Let's practice!

Regular Expressions in Python

Preparing Video For Download...