Normalized and denormalized databases

Conception de la base de données

Lis Sulmont

Curriculum Manager

Back to our book store example

Denormalized: star schema

$$

Normalized: snowflake schema

$$

Conception de la base de données

Denormalized Query

Goal: get quantity of all Octavia E. Butler books sold in Vancouver in Q4 of 2018

  SELECT SUM(quantity) FROM fact_booksales
    -- Join to get city
    INNER JOIN dim_store_star on fact_booksales.store_id = dim_store_star.store_id
    -- Join to get author
    INNER JOIN dim_book_star on fact_booksales.book_id = dim_book_star.book_id
    -- Join to get year and quarter
    INNER JOIN dim_time_star on fact_booksales.time_id = dim_time_star.time_id
  WHERE 
    dim_store_star.city = 'Vancouver' AND dim_book_star.author = 'Octavia E. Butler' AND
    dim_time_star.year = 2018 AND dim_time_star.quarter = 4;
7600

Total of 3 joins

Conception de la base de données

Normalized query

SELECT
  SUM(fact_booksales.quantity)
FROM
  fact_booksales
  -- Join to get city
  INNER JOIN dim_store_sf ON fact_booksales.store_id = dim_store_sf.store_id
  INNER JOIN dim_city_sf ON dim_store_sf.city_id = dim_city_sf.city_id
  -- Join to get author
  INNER JOIN dim_book_sf ON fact_booksales.book_id = dim_book_sf.book_id
  INNER JOIN dim_author_sf ON dim_book_sf.author_id = dim_author_sf.author_id
  -- Join to get year and quarter
  INNER JOIN dim_time_sf ON fact_booksales.time_id = dim_time_sf.time_id
  INNER JOIN dim_month_sf ON dim_time_sf.month_id = dim_month_sf.month_id
  INNER JOIN dim_quarter_sf ON dim_month_sf.quarter_id =  dim_quarter_sf.quarter_id
  INNER JOIN dim_year_sf ON dim_quarter_sf.year_id = dim_year_sf.year_id
Conception de la base de données

Normalized query (continued)

WHERE
  dim_city_sf.city = `Vancouver`
  AND 
  dim_author_sf.author = `Octavia E. Butler`
  AND
  dim_year_sf.year = 2018 AND dim_quarter_sf.quarter = 4; 
sum
7600

Total of 8 joins

So, why would we want to normalize a databases?

Conception de la base de données

Normalization saves space

Denormalized databases enable data redundancy

Conception de la base de données

Normalization saves space

Normalization eliminates data redundancy

Conception de la base de données

Normalization ensures better data integrity

$$

1. Enforces data consistency

Must respect naming conventions because of referential integrity, e.g., 'California', not 'CA' or 'california'

2. Safer updating, removing, and inserting

Less data redundancy = less records to alter

3. Easier to redesign by extending

Smaller tables are easier to extend than larger tables

Conception de la base de données

Database normalization

Advantages

  • Normalization eliminates data redundancy: save on storage

  • Better data integrity: accurate and consistent data

Disadvantages

  • Complex queries require more CPU
Conception de la base de données

Remember OLTP and OLAP?

OLTP

e.g., Operational databases

Typically highly normalized

  • Write-intensive
  • Prioritize quicker and safer insertion of data

OLAP

e.g., Data warehouses

Typically less normalized

  • Read-intensive
  • Prioritize quicker queries for analytics
Conception de la base de données

Let's practice!

Conception de la base de données

Preparing Video For Download...