Cleaning Data in PostgreSQL Databases
Darryl Reeves, Ph.D.
Industry Assistant Professor, New York University
name | grade | inspection_type | census_tract | ...
--------------------------------+-------+-----------------------------------------+--------------+-----
... | ... | ... | ... | ...
EMPANADAS MONUMENTAL | B | Cycle Inspection / Re-inspection | 26900 | ...
ALPHONSO'S PIZZERIA & TRATTORIA | A | Cycle Inspection / Initial Inspection | 202 | ...
THE SPARROW TAVERN | A | Cycle Inspection / Initial Inspection | 12500 | ...
BURGER KING | A | Cycle Inspection / Re-inspection | 86400 | ...
ASTORIA PIZZA | B | Cycle Inspection / Re-inspection | 6300 | ...
... | ... | ... | ... | ...
name | grade | inspection_type | census_tract | ...
--------------------------------+-------+-----------------------------------------+--------------+-----
... | ... | ... | ... | ...
EMPANADAS MONUMENTAL | B | Cycle Inspection / Re-inspection | 26900 | ...
ALPHONSO'S PIZZERIA & TRATTORIA | A | Cycle Inspection / Initial Inspection | 202 | ...
THE SPARROW TAVERN | A | Cycle Inspection / Initial Inspection | 12500 | ...
BURGER KING | A | Cycle Inspection / Re-inspection | 86400 | ...
ASTORIA PIZZA | B | Cycle Inspection / Re-inspection | 6300 | ...
... | ... | ... | ... | ...
name
inspection_type
census_tract
values have a uniform length name | grade | inspection_type | census_tract | ...
--------------------------------+-------+-----------------------------------------+--------------+-----
... | ... | ... | ... | ...
EMPANADAS MONUMENTAL | B | Cycle Inspection / Re-inspection | 26900 | ...
ALPHONSO'S PIZZERIA & TRATTORIA | A | Cycle Inspection / Initial Inspection | 202 | ...
THE SPARROW TAVERN | A | Cycle Inspection / Initial Inspection | 12500 | ...
BURGER KING | A | Cycle Inspection / Re-inspection | 86400 | ...
ASTORIA PIZZA | B | Cycle Inspection / Re-inspection | 6300 | ...
... | ... | ... | ... | ...
name | grade | inspection_type | census_tract | ...
--------------------------------+-------+---------------------------------------+--------------+-----
... | ... | ... | ... | ...
Empanadas Monumental | B | Cycle Inspection / Re-inspection | 026900 | ...
Alphonso'S Pizzeria & Trattoria | A | Cycle Inspection / Initial Inspection | 000202 | ...
The Sparrow Tavern | A | Cycle Inspection / Initial Inspection | 012500 | ...
Burger King | A | Cycle Inspection / Re-inspection | 086400 | ...
Astoria Pizza | B | Cycle Inspection / Re-inspection | 006300 | ...
... | ... | ... | ... | ...
INITCAP(input_string)
- fixing capitalization
SELECT INITCAP('HELLO FRIEND!');
Hello Friend!
REPLACE(input_string, to_replace, replacement)
- replacing one text value with another
SELECT REPLACE('180 Main Street', 'Street', 'St');
180 Main St
LPAD(input_string, length [, fill_value])
- prepending text values to a string
SELECT LPAD('123', 7, 'X');
XXXX123
SELECT
INITCAP(name) as name,
grade,
REPLACE(inspection_type, ' / ', ' / ') as inspection_type,
LPAD(census_tract, 6, '0') as census_tract
FROM
restaurant_inspection;
name | grade | inspection_type | census_tract | ...
--------------------------------+-------+---------------------------------------+--------------+-----
... | ... | ... | ... | ...
Empanadas Monumental | B | Cycle Inspection / Re-inspection | 026900 | ...
Alphonso'S Pizzeria & Trattoria | A | Cycle Inspection / Initial Inspection | 000202 | ...
The Sparrow Tavern | A | Cycle Inspection / Initial Inspection | 012500 | ...
Burger King | A | Cycle Inspection / Re-inspection | 086400 | ...
Astoria Pizza | B | Cycle Inspection / Re-inspection | 006300 | ...
... | ... | ... | ... | ...
Cleaning Data in PostgreSQL Databases