Cleaning Data in PostgreSQL Databases
Darryl Reeves, Ph.D.
Assistant Professor, Long Island University - Brooklyn
camis | name | inspection_date | score | inspection_type | ...
---------+---------------------------------+-----------------+-------+---------------------------------------+-----
... | ... | ... | ... | ... | ...
41659848 | LA BRISA DEL CIBAO | 01/30/2018 | 20 | Cycle Inspection / Initial Inspection | ...
40961447 | MESON SEVILLA RESTAURANT | 03/19/2019 | 50 | Cycle Inspection / Initial Inspection | ...
50063071 | WA BAR | 05/23/2018 | 15 | Cycle Inspection / Initial Inspection | ...
50034992 | EMPANADAS MONUMENTAL | 06/21/2019 | 17 | Cycle Inspection / Re-inspection | ...
50095871 | ALPHONSO'S PIZZERIA & TRATTORIA | 01/16/2020 | 10 | Cycle Inspection / Initial Inspection | ...
... | ... | ... | ... | ... | ...
camis | name | inspection_date | score | inspection_type | ...
---------+------------------------------+-----------------+-------+-------------------------------------------------+-----
... | ... | ... | ... | ... | ...
41104041 | THE SPARROW TAVERN | 09/17/2019 | 13 | Cycle Inspection / Initial Inspection | ...
50016937 | BURGER KING | 09/14/2018 | 12 | Cycle Inspection / Re-inspection | ...
50066469 | DARBAR'S CHICKEN & RIBS | 08/07/2017 | 11 | Pre-permit (Operational) / Reopening Inspection | ...
41195691 | F & J PINE RESTAURANT | 05/02/2019 | 26 | Cycle Inspection / Initial Inspection | ...
50015706 | EL RINCONCITO DE LOS SABORES | 12/18/2019 | A | Cycle Inspection / Initial Inspection | ...
... | ... | ... | ... | ... | ...
SELECT
camis,
name,
inspection_date,
score
FROM
restaurant_inspection
WHERE
score NOT SIMILAR TO '\d+';
SELECT
camis,
name,
inspection_date,
score
FROM
restaurant_inspection
WHERE
score NOT SIMILAR TO '\d{1}' AND
score NOT SIMILAR TO '\d{2}' AND
score NOT SIMILAR TO '\d{3}';
ALTER TABLE restaurant_inspection
ALTER COLUMN score TYPE SMALLINT USING score::smallint;
SMALLINT
: values from -32,768 to 32,767USING
clause specifies conversion of previous valuesMetacharacter | Usage | Example RE | Example Match |
---|---|---|---|
\d |
matches a digit (0-9) | \d\d\d |
'345' |
? |
matches 0 or 1 of previous character | x\d? |
'x5' |
+ |
matches one or more of previous character | \d+ |
'10' |
* |
matches any character 0 or more times | \d* |
'3081' |
[] |
matches any character inside of the brackets | [a-z] |
'f' |
ALTER TABLE restaurant_inspection
ALTER COLUMN score TYPE SMALLINT USING score::smallint;
SELECT
camis,
name,
inspection_date,
score
FROM
restaurant_inspection
WHERE
score < 0;
ALTER TABLE restaurant_inspection
ALTER COLUMN score TYPE SMALLINT USING score::smallint;
SELECT
camis,
name,
inspection_date,
score
FROM
restaurant_inspection
WHERE
score <= -1;
ALTER TABLE restaurant_inspection
ALTER COLUMN score TYPE SMALLINT USING score::smallint;
SELECT
camis,
name,
inspection_date,
score
FROM
restaurant_inspection
WHERE
score < 0 OR
score > 100;
ALTER TABLE restaurant_inspection
ALTER COLUMN score TYPE SMALLINT USING score::smallint;
SELECT
camis,
name,
inspection_date,
score
FROM
restaurant_inspection
WHERE
score < 0 OR
score >= 101;
SELECT
camis, name, inspection_date, score
FROM
restaurant_inspection
WHERE
score NOT BETWEEN 0 AND 100;
camis | name | inspection_date | score
---------+-------------------------+-----------------+-------
... | ... | ... | ...
41702543 | TROPICAL GRILL | 05/14/2018 | 109
50074058 | PAD THAI | 08/01/2018 | 101
50085349 | DON CHILE MEXICAN GRILL | 12/04/2018 | 124
50092932 | ENERGY JUICE BAR | 06/24/2019 | 102
41702543 | TROPICAL GRILL | 05/14/2018 | 109
50034653 | KAI FAN ASIAN CUISINE | 12/06/2019 | -1
... | ... | ... | ...
Cleaning Data in PostgreSQL Databases