Cleaning Data in PostgreSQL Databases
Darryl Reeves, Ph.D.
Industry Assistant Professor, New York University
camis | name | inspection_date | grade_date | ...
---------+----------------------+-----------------+------------+----
... | ... | ... | ... | ...
50034992 | EMPANADAS MONUMENTAL | 06/21/2019 | 06/21/2019 | ...
50095871 | ALPHONSO'S PIZZERIA | 01/16/2020 | 01/16/2020 | ...
41104041 | THE SPARROW TAVERN | 09/17/2019 | 09/17/2019 | ...
50016937 | BURGER KING | 09/14/2018 | 09/14/2018 | ...
50033304 | ASTORIA PIZZA | 12/18/2019 | 12/18/2019 | ...
... | ... | ... | ... | ...
DATE
functionality unavailable for TEXT
columnDATE(string_date)
string_date
to DATE
valuesDATE('2019-12-01')
→ DATE
valueSELECT
camis,
name,
DATE(inspection_date) AS inspection_date,
DATE(grade_date) AS grade_date
FROM
restaurant_inspection;
camis | name | inspection_date | grade_date | ...
---------+---------------------------------+-----------------+------------+-----
... | ... | ... | ... | ...
50034992 | EMPANADAS MONUMENTAL | 2019-06-21 | 2019-06-21 | ...
50095871 | ALPHONSO'S PIZZERIA & TRATTORIA | 2020-01-16 | 2020-01-16 | ...
41104041 | THE SPARROW TAVERN | 2019-09-17 | 2019-09-17 | ...
50016937 | BURGER KING | 2018-09-14 | 2018-09-14 | ...
50033304 | ASTORIA PIZZA | 2019-12-18 | 2019-12-18 | ...
... | ... | ... | ... | ...
TO_DATE(date_string, format_string)
→ DATE
valueDATE('Wednesday, June 10th, 2014')
→ ERRORTO_DATE('Wednesday, June 10th, 2014', 'Day, Month DDth, YYYY')
→ DATE
value camis | name | inspection_date | grade_date | ...
--------+----------------------------------+-----------------+------------+-----
... | ... | ... | ... | ...
41659848 | LA BRISA DEL CIBAO | 2018-01-30 | - | ...
40961447 | MESON SEVILLA RESTAURANT | 2019-03-19 | - | ...
50063071 | WA BAR | 2018-05-23 | - | ...
50034992 | EMPANADAS MONUMENTAL | 2019-06-21 | 2019-06-21 | ...
50095871 | ALPHONSO'S PIZZERIA & TRATTORIA | 2020-01-16 | 2020-01-16 | ...
... | ... | ... | ... | ...
NULLIF(value1, value2)
SELECT
NULLIF(grade_date, '-')
FROM
restaurant_inspection;
TO_CHAR('2012-04-03', YYYY-DD-MM)
TO_CHAR(date_value, format_string)
→ string valueSELECT
camis,
name,
TO_CHAR(
inspection_date::date,
'MM/DD/YY'
) AS inspection_date
FROM
restaurant_inspection;
camis | name | inspection_date | ...
---------+---------------------------------+-----------------+-----
... | ... | ... | ...
41659848 | LA BRISA DEL CIBAO | 01/30/2018 | ...
40961447 | MESON SEVILLA RESTAURANT | 03/19/2019 | ...
50063071 | WA BAR | 05/23/2018 | ...
50034992 | EMPANADAS MONUMENTAL | 06/21/2019 | ...
50095871 | ALPHONSO'S PIZZERIA & TRATTORIA | 01/16/2020 | ...
... | ... | ... | ...
camis | name | inspection_date | ...
---------+---------------------------------+-----------------+-----
... | ... | ... | ...
41659848 | LA BRISA DEL CIBAO | 01/30/20 | ...
40961447 | MESON SEVILLA RESTAURANT | 03/19/20 | ...
50063071 | WA BAR | 05/23/20 | ...
50034992 | EMPANADAS MONUMENTAL | 06/21/20 | ...
50095871 | ALPHONSO'S PIZZERIA & TRATTORIA | 01/16/20 | ...
... | ... | ... | ...
TO_DATE(date_string, format)
TO_CHAR(date_value, format)
YYYY
TO_DATE('2012', 'YYYY')
DATE
MM
TO_DATE('09/2012', 'MM/YYYY')
DATE
DD
TO_DATE('09/03/2012', 'MM/DD/YYYY')
DATE
Day
TO_DATE('Sunday, the 10th', 'Day, the DDth')
DATE
YYYY
TO_CHAR('2012-09-03'::date, 'YYYY')
2012
MM
TO_CHAR('09/03/2012'::date, 'MM/YYYY')
09/2012
DD
TO_CHAR('09/03/2012'::date, 'MM/DD/YYYY')
09/03/2012
Day
TO_CHAR('09/03/2012'::date, 'Day, the DDth')
Monday, the 03rd
Cleaning Data in PostgreSQL Databases