Cleaning Data in PostgreSQL Databases
Darryl Reeves, Ph.D.
Industry Assistant Professor, New York University
camis | name | inspection_datetime | inspection_type | ...
---------+-----------------------+----------------------+---------------------------------------+-----
... | ... | ... | ... | ...
50000458 | BEVERLEY PIZZA & CAFE | 2019-07-08 14:26 | Cycle Inspection / Initial Inspection | ...
50002521 | JADE PALACE | 2018-05-14 12:35 | Cycle Inspection / Initial Inspection | ...
40389732 | GIANDO | 2017-07-10 13:39 | Cycle Inspection / Re-inspection | ...
50044246 | FLEET BAKERY | 2019-10-29 15:40 | Cycle Inspection / Re-inspection | ...
50038120 | SHUN WON FLUSHING | 2018-07-17 16:20 | Cycle Inspection / Re-inspection | ...
... | ... | ... | ... | ...
inspection_datetime
: TIMESTAMP
column
Convert strings to TIMESTAMP
TO_TIMESTAMP(ts_string, format_string)
→ TIMESTAMP
SELECT
camis,
name,
TO_TIMESTAMP(inspection_datetime, 'YYYY-MM-DD HH24:MI'),
inspection_type
FROM
restaurant_inspection;
TO_TIMESTAMP(ts_string, format)
TO_CHAR(ts_value, format)
TO_DATE()
patterns (YYYY
, MM
, Day
, ...) usableTO_TIMESTAMP(ts_string, format)
Pattern | TO_TIMESTAMP() Example |
---|---|
HH24 |
TO_TIMESTAMP('23', 'HH24') → TIMESTAMP |
HH12 |
TO_TIMESTAMP('01', 'HH12') → TIMESTAMP |
MI |
TO_TIMESTAMP('18:13', 'HH24:MI') → TIMESTAMP |
SS |
TO_TIMESTAMP('33:20', 'MI:SS') → TIMESTAMP |
PM or AM |
TO_TIMESTAMP('5:35AM', 'HH12:MIPM') → TIMESTAMP |
EXTRACT(time_unit FROM time_value)
time_value
- DATE
or TIMESTAMP
SELECT
camis,
name,
inspection_datetime,
EXTRACT('year' FROM inspection_datetime) AS year,
inspection_type
FROM
restaurant_inspection;
camis | name | inspection_datetime | year | inspection_type | ...
---------+-----------------------+----------------------------+------+---------------------------------------+-----
... | ... | ... | ... | ... | ...
50000458 | BEVERLEY PIZZA & CAFE | 2019-07-08 06:37:46.658905 | 2019 | Cycle Inspection / Initial Inspection | ...
50002521 | JADE PALACE | 2018-05-14 03:47:24.474573 | 2018 | Cycle Inspection / Initial Inspection | ...
40389732 | GIANDO | 2017-07-10 03:59:12.864428 | 2017 | Cycle Inspection / Re-inspection | ...
50044246 | FLEET BAKERY | 2019-10-29 02:06:33.614964 | 2019 | Cycle Inspection / Re-inspection | ...
50038120 | SHUN WON FLUSHING | 2018-07-17 01:15:04.15666 | 2018 | Cycle Inspection / Re-inspection | ...
... | ... | ... | ... | ... | ...
Time Unit | EXTRACT() Example |
---|---|
year |
EXTRACT('year' FROM '2020-07-20 16:42:21'::timestamp) → 2020 |
month |
EXTRACT('month' FROM '2020-07-20 16:42:21'::timestamp) → 7 |
day |
EXTRACT('day' FROM '2020-07-20 16:42:21'::timestamp) → 20 |
hour |
EXTRACT('hour' FROM '2020-07-20 16:42:21'::timestamp) → 16 |
minute |
EXTRACT('minute' FROM '2020-07-20 16:42:21'::timestamp) → 42 |
second |
EXTRACT('second' FROM '2020-07-20 16:42:21'::timestamp) → 21 |
Cleaning Data in PostgreSQL Databases