Normalizing Relational Data

Introduction to Data Modeling in Snowflake

Nuno Rocha

Director of Engineering

Understanding unnormalized data

  • Unnormalized data (UNF): Data that might lacks a structure, be disorganized, contains repetitions and/or anomalies

All products entity

Introduction to Data Modeling in Snowflake

Understanding unnormalized data

Anomalies in the all products entity

Introduction to Data Modeling in Snowflake

Understanding unnormalized data

Banana anomaly highlighted

Introduction to Data Modeling in Snowflake

Understanding unnormalized data (3)

Manufacturer anomaly highlighted

Introduction to Data Modeling in Snowflake

Understanding unnormalized data (4)

Detail description anomaly highlighted

Introduction to Data Modeling in Snowflake

Understanding unnormalized data (5)

Products anomaly highlighted

Introduction to Data Modeling in Snowflake

Problems with unnormalized data

  • Data redundancy: Unnecessary repetition of data
  • Data anomalies: Irregularities or inconsistencies in the data

Anomalies in the all products entity

Introduction to Data Modeling in Snowflake

Identifying unnormalized data

SELECT manufacturer_id, 
       manufacturer_name, 
       location, 
       COUNT(*) AS repetitions
FROM allproducts
GROUP BY manufacturer_id, 
    manufacturer_name, 
    location
HAVING COUNT(*) > 1;

List of manufacturers with count of products repetitions

Introduction to Data Modeling in Snowflake

Identifying unnormalized data

SELECT DISTINCT category 
FROM allproducts;

List of categories

Introduction to Data Modeling in Snowflake

Identifying unnormalized data

SELECT DISTINCT product_name,
    category
FROM allproducts
WHERE category = 'L';

List of products with L category

Introduction to Data Modeling in Snowflake

Identifying unnormalized data

SELECT DISTINCT product_name,
    category
FROM allproducts
WHERE category = 'L';

List of all products that contain L

Introduction to Data Modeling in Snowflake

Normalized data

  • Normalized data: Organized data into distinct and atomic entities
  • Benefits:
    • Improves data accuracy and reliability
    • Enhances query performance
    • Optimized resource use
    • Scalability
Introduction to Data Modeling in Snowflake

Terminology and functions overview

  • Unnormalized data (UNF): Data that might lack a structure, be disorganized, or contain repetitions or anomalies
  • Normalized data: Organized data into distinct and atomic entities
  • SELECT FROM: SQL clause to retrieve data from a specific entity
  • DISTINCT: SQL clause to return unique (different) values from an attribute
  • COUNT: SQL clause that counts the number of rows that match the specified criteria
  • GROUP BY: SQL clause that groups rows with the same values by the specified attributes
  • HAVING: SQL clause used with GROUP BY to filter groups based on a condition
  • WHERE: SQL clause to filter records based on a set condition
  • AS: SQL clause used to rename a column or table with an alias
Introduction to Data Modeling in Snowflake

Functions overview

-- Querying unique values while being filtered by a specific condition
SELECT DISTINCT column_name
FROM table_name
WHERE column_name  condition  value;
-- Counting the values aggregated by a specific column while filtering the results
SELECT column_name, 
    COUNT(*) AS alias_name
FROM table_name
GROUP BY column_name
HAVING COUNT(*)  condition  value;
Introduction to Data Modeling in Snowflake

Let's practice!

Introduction to Data Modeling in Snowflake

Preparing Video For Download...