Introduction to regular expressions

Introduction to Natural Language Processing in Python

Katharine Jarmul

Founder, kjamistan

What is Natural Language Processing?

  • Field of study focused on making sense of language
    • Using statistics and computers
  • You will learn the basics of NLP
    • Topic identification
    • Text classification
  • NLP applications include:
    • Chatbots
    • Translation
    • Sentiment analysis
    • ... and many more!
Introduction to Natural Language Processing in Python

What exactly are regular expressions?

  • Strings with a special syntax
  • Allow us to match patterns in other strings
  • Applications of regular expressions:

→ Find all web links in a document

→ Parse email addresses

→ Remove/replace unwanted characters

import re

re.match('abc', 'abcdef')
word_regex = '\w+'
re.match(word_regex, 
        'hi there!')
<_sre.SRE_Match object; span=(0, 3), match='abc'>

 

<_sre.SRE_Match object; span=(0, 2), match='hi'>
Introduction to Natural Language Processing in Python

Common regex patterns

pattern matches example
\w+ word 'Magic'
Introduction to Natural Language Processing in Python

Common regex patterns (2)

pattern matches example
\w+ word 'Magic'
\d digit 9
Introduction to Natural Language Processing in Python

Common regex patterns (3)

pattern matches example
\w+ word 'Magic'
\d digit 9
\s space ' '
Introduction to Natural Language Processing in Python

Common regex patterns (4)

pattern matches example
\w+ word 'Magic'
\d digit 9
\s space ' '
.* wildcard 'username74'
Introduction to Natural Language Processing in Python

Common regex patterns (5)

pattern matches example
\w+ word 'Magic'
\d digit 9
\s space ' '
.* wildcard 'username74'
+ or * greedy match 'aaaaaa'
Introduction to Natural Language Processing in Python

Common regex patterns (6)

pattern matches example
\w+ word 'Magic'
\d digit 9
\s space ' '
.* wildcard 'username74'
+ or * greedy match 'aaaaaa'
\S not space 'no_spaces'
Introduction to Natural Language Processing in Python

Common regex patterns (7)

pattern matches example
\w+ word 'Magic'
\d digit 9
\s space ' '
.* wildcard 'username74'
+ or * greedy match 'aaaaaa'
\S not space 'no_spaces'
[a-z] lowercase group 'abcdefg'
Introduction to Natural Language Processing in Python

Python's re module

  • re module
  • split: split a string on regex
  • findall: find all patterns in a string
  • search: search for a pattern
  • match: match an entire string or substring based on a pattern
  • Pattern first, and the string second
  • May return an iterator, string, or match object
re.split('\s+', 'Split on spaces.')
['Split', 'on', 'spaces.']
Introduction to Natural Language Processing in Python

Let's practice!

Introduction to Natural Language Processing in Python

Preparing Video For Download...