Introduction to string manipulation

Regular Expressions in Python

Maria Eugenia Inzaugarat

Data Scientist

You will learn

  • String manipulation

    • e.g. replace and find specific substrings
  • String formatting

    • e.g. interpolating a string in a template
  • Basic and advanced regular expressions

    • e.g. finding complex patterns in a string
Regular Expressions in Python

Why it is important

  • Clean dataset to prepare it for text mining or sentiment analysis

  • Process email content to feed a machine learning algorithm that decides whether an email is spam

  • Parse and extract specific data from a website to build a database

Regular Expressions in Python

Strings

  • Sequence of characters

  • Quotes

my_string = "This is a string"
my_string2 = 'This is also a string'
my_string = 'And this? It's the wrong string'
my_string = "And this? It's the correct string"
Regular Expressions in Python

More strings

  • Length
my_string = "Awesome day"

len(my_string)
11
  • Convert to string
str(123)
'123'
Regular Expressions in Python

Concatenation

  • Concatenate: + operator
my_string1 = "Awesome day"
my_string2 = "for biking"
print(my_string1+" "+my_string2)
Awesome day for biking
Regular Expressions in Python

Indexing

  • Bracket notation

my_string = "Awesome day"
print(my_string[3])
s

 

print(my_string[-1])
y
Regular Expressions in Python

Slicing

  • Bracket notation

my_string = "Awesome day"
print(my_string[0:3])
Awe

 

print(my_string[:5])
print(my_string[5:])
Aweso
me day
Regular Expressions in Python

Stride

  • Specifying stride

my_string = "Awesome day"
print(my_string[0:6:2])
Aeo

 

print(my_string[::-1])
yad emosewA
Regular Expressions in Python

Let's practice!

Regular Expressions in Python

Preparing Video For Download...