Cleaning Data in PostgreSQL Databases
Darryl Reeves, Ph.D.
Industry Assistant Professor, New York University
name | boro | building | street | zip_code | ...
-----------------------------+--------+----------+------------------+----------+-----
... | ... | ... | ... | ... | ...
DARBAR'S CHICKEN & RIBS | Queens | 12609 | LIBERTY AVE | 11419 | ...
F & J PINE RESTAURANT | Bronx | 1913 | BRONXDALE AVENUE | 10462 | ...
EL RINCONCITO DE LOS SABORES | Queens | 13933 | 89TH AVE | 11435 | ...
DON NICO'S | Queens | 9014 | 161ST ST | 11432 | ...
ASTORIA PIZZA | Queens | 3204B | 30TH AVE | 11102 | ...
... | ... | ... | ... | ... | ...
Restaurant Name
Street Address
Boro, NY Zipcode
CONCAT(string1 [, string2, string3, ...])
CONCAT('data', 'cleaning', 'is', 'fun')
→ datacleaningisfun
CONCAT('data', ' ', 'cleaning', ' ', 'is', ' ', 'fun')
→ data cleaning is fun
SELECT
CONCAT(
name, E'\n',
building, ' ', street, E'\n',
boro, ', NY ', zip_code
) AS mailing_address
FROM
restaurant_inspection;
mailing_address
---------------------------
DARBAR'S CHICKEN & RIBS +
12609 LIBERTY AVE +
Queens, NY 11419
F & J PINE RESTAURANT +
1913 BRONXDALE AVENUE +
Bronx, NY 10462
EL RINCONCITO DE LOS SABORES+
13933 89TH AVE +
Queens, NY 11435
DON NICO'S +
9014 161ST ST +
Queens, NY 11432
ASTORIA PIZZA +
3204B 30TH AVE +
Queens, NY 11102
name | boro | building | street | zip_code | ...
-----------------------+-----------+----------+---------------------------+----------+-----
... | ... | ... | ... | ... | ...
IRVING FARMS | Queens | | CENTRAL TERMINAL BUILDING | 11371 | ...
DON PEPIS DELICATESSEN | Manhattan | | AMTRAK LEVEL | 10001 | ...
DUNKIN' | Queens | | CENTRAL TERMINAL BLDG | 11371 | ...
| Queens | 17111 | JAMAICA AVE | 11432 | ...
| Brooklyn | 1489 | FULTON STREET | 11216 | ...
... | ... | ... | ... | ... | ...
SELECT
CONCAT(
name, E'\n',
building, ' ', street, E'\n',
boro, ', NY ', zip_code
) AS mailing_address
FROM
restaurant_inspection;
mailing_address
---------------------------
IRVING FARMS +
CENTRAL TERMINAL BUILDING+
Queens, NY 11371
DON PEPIS DELICATESSEN +
AMTRAK LEVEL +
Manhattan, NY 10001
DUNKIN' +
CENTRAL TERMINAL BLDG +
Queens, NY 11371
+
17111 JAMAICA AVE +
Queens, NY 11432
+
1489 FULTON STREET +
Brooklyn, NY 11216
string1 || string2 [ || string3 || ...]
SELECT 'data' || ' ' || 'cleaning' || ' ' || 'is' || ' ' || 'fun';
data cleaning is fun
NULL
valued arguments → NULL
value
SELECT
name || E'\n' ||
building || ' ' || street || E'\n'
|| boro || ', NY ' || zip_code AS mailing_address
FROM
restaurant_inspection
name | mailing_address
-----------------+-----------------------
SCHNIPPERS | SCHNIPPERS +
| 570 LEXINGTON AVENUE +
| Manhattan, NY 10022
ATOMIC WINGS |
WING LING | WING LING +
| 159B EAST 170 STREET+
| Bronx, NY 10452
JUAN VALDEZ CAFE | JUAN VALDEZ CAFE +
| 140 EAST 57 STREET +
| Manhattan, NY 10022
FULTON GRAND | FULTON GRAND +
| 1011 FULTON STREET +
| Brooklyn, NY 11238
Cleaning Data in PostgreSQL Databases