Cleaning Data in PostgreSQL Databases
Darryl Reeves, Ph.D.
Industry Assistant Professor, New York University
camis | name | inspection_date | score | nta | ...
---------+---------------------------------+-----------------+-------+------+-----
... | ... | ... | ... | ... | ...
41659848 | LA BRISA DEL CIBAO | 01/30/2018 | 20 | QN26 | ...
40961447 | MESON SEVILLA RESTAURANT | 03/19/2019 | 50 | MN15 | ...
50063071 | WA BAR | 05/23/2018 | 15 | MN17 | ...
50034992 | EMPANADAS MONUMENTAL | 06/21/2019 | 17 | MN35 | ...
50095871 | ALPHONSO'S PIZZERIA & TRATTORIA | 01/16/2020 | 10 | MN28 | ...
41104041 | THE SPARROW TAVERN | 09/17/2019 | 13 | QN72 | ...
50016937 | BURGER KING | 09/14/2018 | 12 | QN55 | ...
50066469 | DARBAR'S CHICKEN & RIBS | 08/07/2017 | 11 | QN55 | ...
41195691 | F & J PINE RESTAURANT | 05/02/2019 | 26 | BX49 | ...
50015706 | EL RINCONCITO DE LOS SABORES | 12/18/2019 | 20 | QN35 | ...
... | ... | ... | .. | ... | ...
Valid Prefixes to NTA Code
SELECT * FROM restaurant_inspection WHERE nta LIKE 'QN544';
SELECT * FROM restaurant_inspection WHERE nta = 'QN544';
Pattern Matching Characters
%
- matches any sequence of zero or more characters
_
(underscore) - matches a single character
SELECT
*
FROM
restaurant_inspection
WHERE nta LIKE 'QN%';
SELECT
*
FROM
restaurant_inspection
WHERE nta LIKE 'QN%'
AND nta NOT LIKE 'QN__';
LIKE
is limitedSELECT
camis, name, inspection_date, score, nta
FROM
restaurant_inspection
WHERE nta SIMILAR TO 'QN%' AND nta NOT SIMILAR TO 'QN__';
Metacharacter | Usage | Example RE | Example Match |
---|---|---|---|
\d |
matches a digit (0-9) | \d\d\d |
'345' |
? |
matches 0 or 1 of previous character | x\d? |
'x5' |
+ |
matches one or more of previous character | \d+ |
'10' |
* |
matches any character 0 or more times | \d* |
'3081' |
[] |
matches any character inside of the brackets | [a-z] |
'f' |
SELECT
camis, name, inspection_date, score, nta
FROM
restaurant_inspection
WHERE nta SIMILAR TO 'QN%' AND nta NOT SIMILAR TO 'QN__';
SELECT
camis, name, inspection_date, score, nta
FROM
restaurant_inspection
WHERE
nta NOT SIMILAR TO '[A-Z][A-Z]\d\d';
camis | name | inspection_date | score | nta
---------+--------------------+-----------------+-------+------
41659848 | LA BRISA DEL CIBAO | 01/30/2018 | 20 | Q26
41104041 | THE SPARROW TAVERN | 09/17/2019 | 13 | QN723
Cleaning Data in PostgreSQL Databases