The First Norm

Introduction to Data Modeling in Snowflake

Nuno Rocha

Director of Engineering

The process of data normalization

  • Data normalization: Multi-setp process of structuring data to minimize duplication and dependencies

Faded steps after UNF

Introduction to Data Modeling in Snowflake

The process of data normalization(1)

  • Data normalization: Multi-step process of structuring data to minimize duplication and dependencies

Steps from UNF to 1NF

Introduction to Data Modeling in Snowflake

The process of data normalization (2)

  • Data normalization: Multi-step process of structuring data to minimize duplication and dependencies

Steps from UNF to 3NF

Introduction to Data Modeling in Snowflake

The process of data normalization (3)

  • Data normalization: Multi-step process of structuring data to minimize duplication and dependencies

Steps from UNF to 3NF

Introduction to Data Modeling in Snowflake

The first normal form

  • First normal form (1NF): Ensures each column in an entity holds unique atomic values

All products table with category highlighted

Introduction to Data Modeling in Snowflake

The first normal form (1)

  • UNF category data: L category is not isolated to be able to update the values

List of all products that contain L

Introduction to Data Modeling in Snowflake

The first normal form (2)

  • Validation step: Query the unique values to evaluate compliance with 1NF.
SELECT DISTINCT category 
FROM allproducts;

List of categories

Introduction to Data Modeling in Snowflake

Snowflake functions for 1NF

  • TRIM
  • LATERAL & FLATTEN
  • SPLIT
Introduction to Data Modeling in Snowflake

Snowflake functions for 1NF

  • TRIM: Removes empty spaces from the start and end of of values
SELECT TRIM(category)
FROM allproducts;

TRIM function in category

Introduction to Data Modeling in Snowflake

Snowflake functions for 1NF

  • TRIM: Removes empty spaces from the start and end of of values.
  • LATERAL & FLATTEN: Treat list of values like a table with individual items.
  • SPLIT: Separates values based on a delimiter.
SELECT TRIM(f.value)
FROM allproducts,
LATERAL FLATTEN(INPUT => SPLIT(allproducts.category, ',')) f;

LATERAL, FLATTEN and SPLIT function in category

Introduction to Data Modeling in Snowflake

Applying 1NF

  • Step 1: Create a new entity to transfer the UNF attribute values
CREATE OR REPLACE TABLE categories (
    category_id NUMBER(10,0) PRIMARY KEY,
    category VARCHAR(255)
);
Introduction to Data Modeling in Snowflake

Applying 1NF

  • Step 2.1: Fill new entity with data from the initially unnormalized entity
  • INSERT INTO: SQL command to insert new rows into a table.
INSERT INTO categories (category_id, category)
___;
Introduction to Data Modeling in Snowflake

Applying 1NF

  • Step 2.2: Select the data from allproducts, the initially unnormalized entity
INSERT INTO categories (category_id, category)
SELECT 
    ___,
    ___
FROM allproducts;
Introduction to Data Modeling in Snowflake

Applying 1NF

  • Step 2.3: Use function to split values within a specific attribute
INSERT INTO categories (category_id, category)
SELECT 
    ___, 
    TRIM(f.value)
FROM allproducts,
LATERAL FLATTEN(INPUT => SPLIT(allproducts.category, ',')) f;
Introduction to Data Modeling in Snowflake

Applying 1NF

  • Step 2.4: Select the row number to define a unique identifier
INSERT INTO categories (category_id, category)
SELECT 
    ROW_NUMBER() OVER (ORDER BY TRIM(f.value)), 
    TRIM(f.value)
FROM allproducts,
LATERAL FLATTEN(INPUT => SPLIT(allproducts.category, ',')) f;
Introduction to Data Modeling in Snowflake

Applying 1NF

  • Step 2.4: Aggregate the data to generate unique category values
INSERT INTO categories (category_id, category)
SELECT 
    ROW_NUMBER() OVER (ORDER BY TRIM(f.value)), 
    TRIM(f.value)
FROM allproducts,
LATERAL FLATTEN(INPUT => SPLIT(allproducts.category, ',')) f
GROUP BY TRIM(f.value);
Introduction to Data Modeling in Snowflake

Towards data normalization

All products data, from UNF to 1NF

Introduction to Data Modeling in Snowflake

Terminology and functions overview

  • Data normalization: Process of structuring data to minimize duplication and dependency
  • Normal forms: Guidelines to apply data normalization
  • First normal form (1NF): Ensures each column in an entity holds unique atomic values
  • INSERT INTO: SQL command to insert new rows into a table
  • TRIM: SQL function to remove spaces at the start and end of of values
  • LATERAL FLATTEN(INPUT => SPLIT()): Snowflake function to split values into a rows
  • ROW_NUMBER() OVER (ORDER BY): SQL function to generate a sequential number
  • GROUP BY: SQL clause to aggregate data that have the same values
Introduction to Data Modeling in Snowflake

Functions overview

-- Fill a entity with data from a query result
INSERT INTO table_name (column_name, other_columns)
SELECT 
    -- Generate a unique value using the row number
    ROW_NUMBER() OVER (ORDER BY TRIM(alias.value)),
    TRIM(alias.value)
FROM another_table,
-- Split a text attribute value based on a delimiter 
LATERAL FLATTEN(INPUT => SPLIT(another_table.column_name, 'delimiter_value'))alias
-- Aggregate the data to ensure uniqueness of values
GROUP BY TRIM(alias.value);
Introduction to Data Modeling in Snowflake

Let's practice!

Introduction to Data Modeling in Snowflake

Preparing Video For Download...