Normalized and denormalized databases

Database Design

Lis Sulmont

Curriculum Manager

Back to our book store example

Denormalized: star schema

$$

Normalized: snowflake schema

$$

Database Design

Denormalized Query

Goal: get quantity of all Octavia E. Butler books sold in Vancouver in Q4 of 2018

  SELECT SUM(quantity) FROM fact_booksales
    -- Join to get city
    INNER JOIN dim_store_star on fact_booksales.store_id = dim_store_star.store_id
    -- Join to get author
    INNER JOIN dim_book_star on fact_booksales.book_id = dim_book_star.book_id
    -- Join to get year and quarter
    INNER JOIN dim_time_star on fact_booksales.time_id = dim_time_star.time_id
  WHERE 
    dim_store_star.city = 'Vancouver' AND dim_book_star.author = 'Octavia E. Butler' AND
    dim_time_star.year = 2018 AND dim_time_star.quarter = 4;
7600

Total of 3 joins

Database Design

Normalized query

SELECT
  SUM(fact_booksales.quantity)
FROM
  fact_booksales
  -- Join to get city
  INNER JOIN dim_store_sf ON fact_booksales.store_id = dim_store_sf.store_id
  INNER JOIN dim_city_sf ON dim_store_sf.city_id = dim_city_sf.city_id
  -- Join to get author
  INNER JOIN dim_book_sf ON fact_booksales.book_id = dim_book_sf.book_id
  INNER JOIN dim_author_sf ON dim_book_sf.author_id = dim_author_sf.author_id
  -- Join to get year and quarter
  INNER JOIN dim_time_sf ON fact_booksales.time_id = dim_time_sf.time_id
  INNER JOIN dim_month_sf ON dim_time_sf.month_id = dim_month_sf.month_id
  INNER JOIN dim_quarter_sf ON dim_month_sf.quarter_id =  dim_quarter_sf.quarter_id
  INNER JOIN dim_year_sf ON dim_quarter_sf.year_id = dim_year_sf.year_id
Database Design

Normalized query (continued)

WHERE
  dim_city_sf.city = `Vancouver`
  AND 
  dim_author_sf.author = `Octavia E. Butler`
  AND
  dim_year_sf.year = 2018 AND dim_quarter_sf.quarter = 4; 
sum
7600

Total of 8 joins

So, why would we want to normalize a databases?

Database Design

Normalization saves space

Denormalized databases enable data redundancy

Database Design

Normalization saves space

Normalization eliminates data redundancy

Database Design

Normalization ensures better data integrity

$$

1. Enforces data consistency

Must respect naming conventions because of referential integrity, e.g., 'California', not 'CA' or 'california'

2. Safer updating, removing, and inserting

Less data redundancy = less records to alter

3. Easier to redesign by extending

Smaller tables are easier to extend than larger tables

Database Design

Database normalization

Advantages

  • Normalization eliminates data redundancy: save on storage

  • Better data integrity: accurate and consistent data

Disadvantages

  • Complex queries require more CPU
Database Design

Remember OLTP and OLAP?

OLTP

e.g., Operational databases

Typically highly normalized

  • Write-intensive
  • Prioritize quicker and safer insertion of data

OLAP

e.g., Data warehouses

Typically less normalized

  • Read-intensive
  • Prioritize quicker queries for analytics
Database Design

Let's practice!

Database Design

Preparing Video For Download...