Pattern matching

Cleaning Data in SQL Server Databases

Miriam Antona

Software Engineer

Pattern matching - Introduction

SELECT * FROM series
| id  | name            | contact_number | ... |
|-----|-----------------|----------------|-----|
| 1   | Adventure Time  | 555-906-8845   | ... |
| 2   | Dexter          | 555-156-8845   | ... |
| 3   | Futurama        | 555-210-9951   | ... |
| 4   | Game of Thrones | 555-543-6641   | ... |
| ... | ...             | ...            | ... |
  • Valid numbers: ###-###-####
    • first and fourth numbers between 2 and 9
    • the rest between 0 and 9
  • Invalid number: 555-156-8845
Cleaning Data in SQL Server Databases

Pattern matching - Introduction

  • SQL Server doesn't provide full blown set of regular expressions.
  • SQL Server can match patters using LIKE.
  • To have the full blown set of regular expressions -> create and install extensions.
Cleaning Data in SQL Server Databases

Pattern matching - LIKE

  • Determines if a string matches a specified pattern
Cleaning Data in SQL Server Databases

Pattern matching - LIKE

  • Determines if a string matches a specified pattern
Wildcard character Description Example
% Any string of zero or more characters WHERE contact_number LIKE '555-%'
- (underscore) Any single character WHERE contact_number LIKE '___-___-____'
[] Any single character within the specified range or set WHERE contact_number LIKE '[2-9][0-9][0-9]-[2-9][0-9][0-9]-[0-9][0-9][0-9][0-9]
[^] Any single character not within the specified range or set WHERE contact_number LIKE '[^2-9]'
Cleaning Data in SQL Server Databases

Pattern matching - example with %

SELECT name, contact_number
FROM series
WHERE contact_number LIKE '555%'
| name            | contact_number |
|-----------------|----------------|
| Adventure Time  | 555-906-8845   |
| Dexter          | 555-156-8845   |
| Futurama        | 555-210-9951   |
| Game of Thrones | 555-abc-6641   |
| ...             | ...            |
Cleaning Data in SQL Server Databases

Pattern matching - example with %

SELECT 
    name, 
    contact_number
FROM series
WHERE contact_number NOT LIKE '555%'
| name            | contact_number |
|-----------------|----------------|
| The Good Doctor | 000-930-1274   |
Cleaning Data in SQL Server Databases

Pattern matching - example with [] (brackets)

SELECT 
    name, 
    contact_number
FROM series
WHERE contact_number LIKE '[2-9][0-9][0-9]-[2-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
| name           | contact_number |
|----------------|----------------|
| Adventure Time | 555-906-8845   |
| Futurama       | 555-210-9951   |
| Homeland       | 555-985-6314   |
| Westworld      | 555-456-1234   |
| ...            | ...            |
Cleaning Data in SQL Server Databases

Pattern matching - example with [] (brackets)

SELECT 
    name, 
    contact_number
FROM series
WHERE contact_number NOT LIKE '[2-9][0-9][0-9]-[2-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
| name            | contact_number |
|-----------------|----------------|
| Dexter          | 555-156-8845   |
| Game of Thrones | 555-abc-6641   |
| The Good Doctor | 000-930-1274   |
Cleaning Data in SQL Server Databases

Let's practice!

Cleaning Data in SQL Server Databases

Preparing Video For Download...