Normal forms

Projeto de banco de dados

Lis Sulmont

Curriculum Manager

Normalization

Identify repeating groups of data and create new tables for them

A more formal definition:

The goals of normalization are to:

  • Be able to characterize the level of redundancy in a relational schema
  • Provide mechanisms for transforming schemas in order to remove redundancy
1 Database Design, 2nd Edition by Adrienne Watt
Projeto de banco de dados

Normal forms (NF)

Ordered from least to most normalized:

  • First normal form (1NF)
  • Second normal form (2NF)
  • Third normal form (3NF)
  • Elementary key normal form (EKNF)
  • Boyce-Codd normal form (BCNF)

$$

  • Fourth normal form (4NF)
  • Essential tuple normal form (ETNF)
  • Fifth normal form (5NF)
  • Domain-key Normal Form (DKNF)
  • Sixth normal form (6NF)
1 https://en.wikipedia.org/wiki/Database_normalization
Projeto de banco de dados

1NF rules

  • Each record must be unique - no duplicate rows
  • Each cell must hold one value

Initial data

| Student_id | Student_Email   | Courses_Completed                                        | 
|------------|-----------------|----------------------------------------------------------|
| 235        | [email protected]   | Introduction to Python, Intermediate Python              |
| 455        | [email protected] | Cleaning Data in R                                       | 
| 767        | [email protected] | Machine Learning Toolbox, Deep Learning in Python        |
Projeto de banco de dados

In 1NF form

| Student_id | Student_Email   | 
|------------|-----------------|
| 235        | [email protected]   | 
| 455        | [email protected] | 
| 767        | [email protected] | 
| Student_id | Completed                |
|------------|--------------------------|
| 235        | Introduction to Python   | 
| 235        | Intermediate Python      | 
| 455        | Cleaning Data in R       | 
| 767        | Machine Learning Toolbox | 
| 767        | Deep Learning in Python  | 
Projeto de banco de dados

2NF

  • Must satisfy 1NF AND
    • If primary key is one column
      • then automatically satisfies 2NF
    • If there is a composite primary key
      • then each non-key column must be dependent on all the keys

Initial data

| Student_id (PK) | Course_id (PK) | Instructor_id | Instructor    | Progress |
|-----------------|----------------|---------------|---------------|----------|
| 235             | 2001           | 560           | Nick Carchedi | .55      |
| 455             | 2345           | 658           | Ginger Grant  | .10      |
| 767             | 6584           | 999           | Chester Ismay | 1.00     |
Projeto de banco de dados

In 2NF form

| Student_id (PK) | Course_id (PK) | Percent_Completed |
|-----------------|----------------|-------------------|
| 235             | 2001           | .55               |
| 455             | 2345           | .10               |
| 767             | 6584           | 1.00              |
| Course_id (PK) | Instructor_id | Instructor    |
|----------------|---------------|---------------|
| 2001           | 560           | Nick Carchedi |
| 2345           | 658           | Ginger Grant  |
| 6584           | 999           | Chester Ismay |
Projeto de banco de dados

3NF

  • Satisfies 2NF
  • No transitive dependencies: non-key columns can't depend on other non-key columns

Initial Data

| Course_id (PK) | Instructor_id | Instructor    | Tech   |
|----------------|---------------|---------------|--------|
| 2001           | 560           | Nick Carchedi | Python |
| 2345           | 658           | Ginger Grant  | SQL    |
| 6584           | 999           | Chester Ismay | R      |
Projeto de banco de dados

In 3NF

| Course_id (PK) | Instructor    | Tech   |
|----------------|---------------|--------|
| 2001           | Nick Carchedi | Python |
| 2345           | Ginger Grant  | SQL    |
| 6584           | Chester Ismay | R      |
| Instructor_id | Instructor    | 
|---------------|---------------|
| 560           | Nick Carchedi | 
| 658           | Ginger Grant  | 
| 999           | Chester Ismay |
Projeto de banco de dados

Data anomalies

What is risked if we don't normalize enough?

1. Update anomaly

2. Insertion anomaly

3. Deletion anomaly

Projeto de banco de dados

Update anomaly

Data inconsistency caused by data redundancy when updating

| Student_ID | Student_Email   | Enrolled_in             | Taught_by           |
|------------|-----------------|-------------------------|---------------------|
| 230        | [email protected]  | Cleaning Data in R      | Maggie Matsui       |
| 367        | [email protected] | Data Visualization in R | Ronald Pearson      |
| 520        | [email protected]   | Introduction to Python  | Hugo Bowne-Anderson |
| 520        | [email protected]   | Arima Models in R       | David Stoffer       |

To update student 520's email:

  • Need to update more than one record, otherwise, there will be inconsistency
  • User updating needs to know about redundancy
Projeto de banco de dados

Insertion anomaly

Unable to add a record due to missing attributes

| Student_ID | Student_Email   | Enrolled_in             | Taught_by           |
|------------|-----------------|-------------------------|---------------------|
| 230        | [email protected]  | Cleaning Data in R      | Maggie Matsui       |
| 367        | [email protected] | Data Visualization in R | Ronald Pearson      |
| 520        | [email protected]   | Introduction to Python  | Hugo Bowne-Anderson |
| 520        | [email protected]   | Arima Models in R       | David Stoffer       |

Unable to insert a student who has signed up but not enrolled in any courses

Projeto de banco de dados

Deletion anomaly

Deletion of record(s) causes unintentional loss of data

| Student_ID | Student_Email   | Enrolled_in             | Taught_by           |
|------------|-----------------|-------------------------|---------------------|
| 230        | [email protected]  | Cleaning Data in R      | Maggie Matsui       |
| 367        | [email protected] | Data Visualization in R | Ronald Pearson      |
| 520        | [email protected]   | Introduction to Python  | Hugo Bowne-Anderson |
| 520        | [email protected]   | Arima Models in R       | David Stoffer       |

If we delete Student 230, what happens to the data on Cleaning Data in R?

Projeto de banco de dados

Data anomalies

What is risked if we don't normalize enough?

1. Update anomaly

2. Insertion anomaly

3. Deletion anomaly

The more normalized the database, the less prone it will be to data anomalies

Don't forget the downsides of normalization from the last video

Projeto de banco de dados

Let's practice!

Projeto de banco de dados

Preparing Video For Download...