Ongeldige waarden detecteren

Data opschonen in PostgreSQL-databases

Darryl Reeves, Ph.D.

Assistant Professor, Long Island University - Brooklyn

Ongeldige gegevenswaarden

  camis   |              name               | inspection_date | score |            inspection_type            | ... 
 ---------+---------------------------------+-----------------+-------+---------------------------------------+-----
 ...      | ...                             | ...             | ...   | ...                                   | ...
 41659848 | LA BRISA DEL CIBAO              | 01/30/2018      | 20    | Cycle Inspection / Initial Inspection | ...
 40961447 | MESON SEVILLA RESTAURANT        | 03/19/2019      | 50    | Cycle Inspection / Initial Inspection | ...
 50063071 | WA BAR                          | 05/23/2018      | 15    | Cycle Inspection / Initial Inspection | ...
 50034992 | EMPANADAS MONUMENTAL            | 06/21/2019      | 17    | Cycle Inspection / Re-inspection      | ...
 50095871 | ALPHONSO'S PIZZERIA & TRATTORIA | 01/16/2020      | 10    | Cycle Inspection / Initial Inspection | ...
 ...      | ...                             | ...             | ...   | ...                                   | ...
  camis   |             name             | inspection_date | score |                 inspection_type                 | ... 
 ---------+------------------------------+-----------------+-------+-------------------------------------------------+-----
 ...      | ...                          | ...             | ...   | ...                                             | ...
 41104041 | THE SPARROW TAVERN           | 09/17/2019      | 13    | Cycle Inspection / Initial Inspection           | ...
 50016937 | BURGER KING                  | 09/14/2018      | 12    | Cycle Inspection / Re-inspection                | ...
 50066469 | DARBAR'S CHICKEN & RIBS      | 08/07/2017      | 11    | Pre-permit (Operational) / Reopening Inspection | ...
 41195691 | F & J PINE RESTAURANT        | 05/02/2019      | 26    | Cycle Inspection / Initial Inspection           | ...
 50015706 | EL RINCONCITO DE LOS SABORES | 12/18/2019      | A     | Cycle Inspection / Initial Inspection           | ...
 ...      | ...                          | ...             | ...   | ...                                             | ...
Data opschonen in PostgreSQL-databases

Ongeldige data afvangen met patroonmatching

SELECT
  camis,
  name,
  inspection_date,
  score
FROM
  restaurant_inspection
WHERE
  score NOT SIMILAR TO '\d+';
Data opschonen in PostgreSQL-databases

Ongeldige data afvangen met patroonmatching

  • Query beperkt alleen niet-cijfertekens
  • Geen beperking op lengte van waarde
SELECT
  camis,
  name,
  inspection_date,
  score
FROM
  restaurant_inspection
WHERE
  score NOT SIMILAR TO '\d{1}' AND
  score NOT SIMILAR TO '\d{2}' AND
  score NOT SIMILAR TO '\d{3}';
Data opschonen in PostgreSQL-databases

Typebeperkingen gebruiken

  • Kolom bevat gehele getallen
  • Kolom mag geen niet-gehele getallen toestaan
ALTER TABLE restaurant_inspection
ALTER COLUMN score TYPE SMALLINT USING score::smallint;
  • SMALLINT: waarden van -32.768 tot 32.767
  • USING-clausule specificeert conversie van eerdere waarden
Data opschonen in PostgreSQL-databases

Herhaling: basis van reguliere expressies

Metateken Gebruik Voorbeeld-RE Voorbeeldmatch
\d matcht een cijfer (0-9) \d\d\d '345'
? matcht 0 of 1 van vorig teken x\d? 'x5'
+ matcht één of meer van vorig teken \d+ '10'
* matcht elk teken 0 of meer keer \d* '3081'
[] matcht elk teken binnen de haakjes [a-z] 'f'
Data opschonen in PostgreSQL-databases

Typebeperkingen maken bereikcontroles mogelijk

ALTER TABLE restaurant_inspection
ALTER COLUMN score TYPE SMALLINT USING score::smallint;

SELECT
  camis,
  name,
  inspection_date,
  score
FROM
  restaurant_inspection
WHERE
  score < 0;
Data opschonen in PostgreSQL-databases

Typebeperkingen maken bereikcontroles mogelijk

ALTER TABLE restaurant_inspection
ALTER COLUMN score TYPE SMALLINT USING score::smallint;

SELECT
  camis,
  name,
  inspection_date,
  score
FROM
  restaurant_inspection
WHERE
  score <= -1;
Data opschonen in PostgreSQL-databases

Typebeperkingen maken bereikcontroles mogelijk

ALTER TABLE restaurant_inspection
ALTER COLUMN score TYPE SMALLINT USING score::smallint;

SELECT
  camis,
  name,
  inspection_date,
  score
FROM
  restaurant_inspection
WHERE
  score < 0 OR
  score > 100;
Data opschonen in PostgreSQL-databases

Typebeperkingen maken bereikcontroles mogelijk

ALTER TABLE restaurant_inspection
ALTER COLUMN score TYPE SMALLINT USING score::smallint;

SELECT
  camis,
  name,
  inspection_date,
  score
FROM
  restaurant_inspection
WHERE
  score < 0 OR
  score >= 101;
Data opschonen in PostgreSQL-databases

De BETWEEN-operator

SELECT
  camis, name, inspection_date, score
FROM
  restaurant_inspection
WHERE
  score NOT BETWEEN 0 AND 100;
  camis   |          name           | inspection_date | score 
 ---------+-------------------------+-----------------+-------
 ...      | ...                     | ...             |   ...
 41702543 | TROPICAL GRILL          | 05/14/2018      |   109
 50074058 | PAD THAI                | 08/01/2018      |   101
 50085349 | DON CHILE MEXICAN GRILL | 12/04/2018      |   124
 50092932 | ENERGY JUICE BAR        | 06/24/2019      |   102
 41702543 | TROPICAL GRILL          | 05/14/2018      |   109
 50034653 | KAI FAN ASIAN CUISINE   | 12/06/2019      |    -1
 ...      | ...                     | ...             |   ...
Data opschonen in PostgreSQL-databases

Laten we oefenen!

Data opschonen in PostgreSQL-databases

Preparing Video For Download...