The importance of data normalization

Creating PostgreSQL Databases

Darryl Reeves

Industry Assistant Professor, New York University

Example 1: redundant data

  • Data redundancy can be problematic
CREATE TABLE loan (
    borrower_id INTEGER REFERENCES borrower(id),
    bank_name VARCHAR(50) DEFAULT NULL,
    ...
);
CREATE TABLE bank ( 
    id SERIAL PRIMARY KEY, 
    name VARCHAR(50) DEFAULT NULL, 
    ... 
);
Creating PostgreSQL Databases

Example 1: redundant data

CREATE TABLE loan (
    borrower_id INTEGER REFERENCES borrower(id),
    bank_name VARCHAR(50) DEFAULT NULL,
    ...
);
CREATE TABLE bank ( 
    id SERIAL PRIMARY KEY, 
    name VARCHAR(50) DEFAULT NULL, 
    ... 
);
  • Problem 1: Different banks/same name
  • Problem 2: Name changes
Creating PostgreSQL Databases

Example 1: redundant data

CREATE TABLE loan (
    borrower_id INTEGER REFERENCES borrower(id),
    bank_id INTEGER REFERENCES bank(id),
    ...
);
  • Banks share name with distinct ids
  • Updates to bank names will only affect bank table
Creating PostgreSQL Databases

Example 2: consolidating records

applicant

id name
1 Jane Simmmons
2 Rick Demps
3 Pam Jones

borrower

id name
1 Jack Smith
2 Sara Williams
3 Jennifer Valdez
Creating PostgreSQL Databases

Example 2: consolidating records

applicant

id name
1 Jane Simmmons
2 Rick Demps
3 Pam Jones

borrower

id name
1 Jack Smith
2 Sara Williams
3 Jennifer Valdez
4 Pam Jones
Creating PostgreSQL Databases

Example 2: consolidating records

applicant

id name
1 Jane Simmmons
2 Rick Demps
3 Pam Jones

borrower

id name
1 Jack Smith
2 Sara Williams
3 Jennifer Valdez
Creating PostgreSQL Databases

Example 2: consolidating records

applicant

id name
1 Jane Simmmons
2 Rick Demps

borrower

id name
1 Jack Smith
2 Sara Williams
3 Jennifer Valdez
4 Pam Jones
Creating PostgreSQL Databases

Example 2: consolidating records

CREATE TABLE borrower (
    id SERIAL PRIMARY KEY,
    name VARCHAR(50) NOT NULL
);
Creating PostgreSQL Databases

Example 2: consolidating records

CREATE TABLE borrower (
    id SERIAL PRIMARY KEY,
    name VARCHAR(50) NOT NULL,
    approved BOOLEAN DEFAULT NULL
);
  • approved is NULL => applicant
  • approved is true => borrower
  • approved is false => denied application
Creating PostgreSQL Databases

Why normalize data?

  • Reduces data duplication
  • Increases data consistency
  • Improves data organization
Creating PostgreSQL Databases

Let's practice!

Creating PostgreSQL Databases

Preparing Video For Download...