Introduction to regular expressions

Introduction to Natural Language Processing in Python

Katharine Jarmul

Founder, kjamistan

What is Natural Language Processing?

Field of study focused on making sense of language
- Using statistics and computers
You will learn the basics of NLP
- Topic identification
- Text classification
NLP applications include:
- Chatbots
- Translation
- Sentiment analysis
- ... and many more!

What exactly are regular expressions?

Strings with a special syntax
Allow us to match patterns in other strings
Applications of regular expressions:

→ Find all web links in a document

→ Parse email addresses

→ Remove/replace unwanted characters

import re

re.match('abc', 'abcdef')

word_regex = '\w+'
re.match(word_regex, 
        'hi there!')

<_sre.SRE_Match object; span=(0, 3), match='abc'>

<_sre.SRE_Match object; span=(0, 2), match='hi'>

Common regex patterns

pattern	matches	example
\w+	word	'Magic'

Common regex patterns (2)

pattern	matches	example
\w+	word	'Magic'
\d	digit	9

Common regex patterns (3)

pattern	matches	example
\w+	word	'Magic'
\d	digit	9
\s	space	' '

Common regex patterns (4)

pattern	matches	example
\w+	word	'Magic'
\d	digit	9
\s	space	' '
.*	wildcard	'username74'

Common regex patterns (5)

pattern	matches	example
\w+	word	'Magic'
\d	digit	9
\s	space	' '
.*	wildcard	'username74'
+ or *	greedy match	'aaaaaa'

Common regex patterns (6)

pattern	matches	example
\w+	word	'Magic'
\d	digit	9
\s	space	' '
.*	wildcard	'username74'
+ or *	greedy match	'aaaaaa'
\S	not space	'no_spaces'

Common regex patterns (7)

pattern	matches	example
\w+	word	'Magic'
\d	digit	9
\s	space	' '
.*	wildcard	'username74'
+ or *	greedy match	'aaaaaa'
\S	not space	'no_spaces'
[a-z]	lowercase group	'abcdefg'

Python's re module

re module
split: split a string on regex
findall: find all patterns in a string
search: search for a pattern
match: match an entire string or substring based on a pattern

Pattern first, and the string second
May return an iterator, string, or match object

re.split('\s+', 'Split on spaces.')

['Split', 'on', 'spaces.']

Let's practice!

Introduction to Natural Language Processing in Python

Preparing Video For Download...