Data Vault

Introduction to Data Modeling in Snowflake

Nuno Rocha

Director of Engineering

Introduction to the data vault model

  • Data vault model: A modeling technique focusing on historical data tracking, characterized by using hubs, links, and satellites.

Data vault model

Introduction to Data Modeling in Snowflake

Components of data vault

  • Hubs: Represent unique business concepts using a singular business key.

Data vault model: hubs

Introduction to Data Modeling in Snowflake

Components of data vault (1)

  • Links: Capture relationships and interactions between hubs.

Data vault model: Links

Introduction to Data Modeling in Snowflake

Components of data vault (2)

  • Satellites: Store descriptive and historical details related to hubs and links.

Data vault model: Satellites

Introduction to Data Modeling in Snowflake

Building hubs

College data vault hubs

Introduction to Data Modeling in Snowflake

Building hubs (1)

  • AUTOINCREMENT: Attribute property to automatically generate unique, sequentially increasing numeric values for each new row.
CREATE OR REPLACE TABLE hub_students (
    student_key NUMBER(10,0) AUTOINCREMENT PRIMARY KEY
);
Introduction to Data Modeling in Snowflake

Building hubs (2)

  • Create a new hub with a unique numerical key generated automatically and the hub's concept id:
CREATE OR REPLACE TABLE hub_students (
    student_key NUMBER(10,0) AUTOINCREMENT PRIMARY KEY,
    student_id NUMBER(10,0)
);
1 Next, we list the business key that identifies each concept, student_id will identify each student
Introduction to Data Modeling in Snowflake

Building hubs (3)

  • Add historical tracking attributes:
CREATE OR REPLACE TABLE hub_students (
    student_key NUMBER(10,0) AUTOINCREMENT PRIMARY KEY,
    student_id NUMBER(10,0),
    load_date TIMESTAMP,
    record_source VARCHAR(255)
);
Introduction to Data Modeling in Snowflake

Building hubs (4)

  • Create new classes hub:
CREATE OR REPLACE TABLE hub_classes (
    class_key NUMBER(10,0)
          AUTOINCREMENT 
          PRIMARY KEY,
    class_id NUMBER(10,0),
    load_date TIMESTAMP,
    record_source VARCHAR(255)
);
  • Create new schools hub:
CREATE OR REPLACE TABLE hub_schools (
    school_key NUMBER(10,0) 
          AUTOINCREMENT 
          PRIMARY KEY,
    school_id NUMBER(10,0),
    load_date TIMESTAMP,
    record_source VARCHAR(255)
);
Introduction to Data Modeling in Snowflake

Building links

College data vault links

Introduction to Data Modeling in Snowflake

Building links (1)

  • Create a link entity with a unique numerical key generated automatically:
CREATE OR REPLACE TABLE link_enrollments (
    link_key NUMBER(10,0) AUTOINCREMENT PRIMARY KEY
);
Introduction to Data Modeling in Snowflake

Building links (2)

  • Add relationships to other entities:
CREATE OR REPLACE TABLE link_enrollments (
    link_key NUMBER(10,0) AUTOINCREMENT PRIMARY KEY,
    student_key NUMBER(10,0),
    class_key NUMBER(10,0),
    FOREIGN KEY (student_key) REFERENCES hub_students(student_key),
    FOREIGN KEY (class_key) REFERENCES hub_classes(class_key)
);
Introduction to Data Modeling in Snowflake

Building links (3)

  • Add historical tracking attributes:
CREATE OR REPLACE TABLE link_enrollments (
    link_key NUMBER(10,0) AUTOINCREMENT PRIMARY KEY,
    student_key NUMBER(10,0),
    class_key NUMBER(10,0),
    load_date TIMESTAMP,
    record_source VARCHAR(255),
    FOREIGN KEY (student_key) REFERENCES hub_students(student_key),
    FOREIGN KEY (class_key) REFERENCES hub_classes(class_key)
);
Introduction to Data Modeling in Snowflake

Building links (4)

  • Create new offerings link entity:
CREATE OR REPLACE TABLE link_offerings (
    link_key NUMBER(10,0) AUTOINCREMENT PRIMARY KEY,
    class_key NUMBER(10,0),
    school_key NUMBER(10,0),
    load_date TIMESTAMP,
    record_source VARCHAR(255),
    FOREIGN KEY (class_key) REFERENCES hub_classes(class_key),
    FOREIGN KEY (school_key) REFERENCES hub_schools(school_key)
);
Introduction to Data Modeling in Snowflake

Building satellites

College data vault satellites

Introduction to Data Modeling in Snowflake

Building satellites (1)

  • Create a new satellite entity listing all concept attributes:
CREATE OR REPLACE TABLE sat_student (
    name VARCHAR(255),
    email VARCHAR(255)
);
Introduction to Data Modeling in Snowflake

Building satellites (2)

  • Add historical tracking attributes:
CREATE OR REPLACE TABLE sat_student (
    name VARCHAR(255),
    email VARCHAR(255),
    load_date TIMESTAMP,
    record_source VARCHAR(255)
);
Introduction to Data Modeling in Snowflake

Building satellites (3)

  • Add a link between the satellite and its respective hub:
CREATE OR REPLACE TABLE sat_student (
    student_key NUMBER(10,0),
    name VARCHAR(255),
    email VARCHAR(255),
    load_date TIMESTAMP,
    record_source VARCHAR(255),
    FOREIGN KEY (student_key) REFERENCES hub_students(student_key)
);
Introduction to Data Modeling in Snowflake

Building satellites (4)

  • Create a new class satellite:
CREATE OR REPLACE TABLE sat_class (
    class_key NUMBER(10,0),
    class_name VARCHAR(255),
    load_date TIMESTAMP,
    record_source VARCHAR(255),
    FOREIGN KEY (class_key) 
    REFERENCES hub_classes(class_key)
);
  • Create a new class school:
CREATE OR REPLACE TABLE sat_school (
    school_key NUMBER(10,0),
    school_name VARCHAR(255),
    load_date TIMESTAMP,
    record_source VARCHAR(255),
    FOREIGN KEY (school_key) 
    REFERENCES hub_schools(school_key)
);
Introduction to Data Modeling in Snowflake

Terminology and functions overview

  • Data vault model: A modeling technique focusing on historical data tracking, characterized by using hubs, links, and satellites.
  • Hubs: Represent unique business concepts using a singular business key.
  • Links: Capture relationships and interactions between hubs.
  • Satellites: Store descriptive and historical details related to hubs and links.
  • CREATE OR REPLACE TABLE: SQL command to create or replace a table structure.
  • PRIMARY KEY: SQL clause to define a column as the unique identifier.
  • AUTOINCREMENT: Attribute property to automatically generate unique, sequentially increasing numeric values for each new row.
  • FOREIGN KEY (...) REFERENCES (...): SQL clause to create a link between two tables.
Introduction to Data Modeling in Snowflake

Functions overview

CREATE OR REPLACE TABLE table_name (
      -- Create an auto generated unique value as primary key
      unique_key column_datatype AUTOINCREMENT PRIMARY KEY,
      other_business_key column_datatype,
      foreign_column column_datatype,
      other_foreign_column column_datatype,
      -- Adding relationship with other entities
      FOREIGN KEY(foreign_column) REFERENCES foreign_table(PK_from_foreign_table),
      FOREIGN KEY(other_foreign) REFERENCES foreign_table(PK_from_other_foreign)
);
Introduction to Data Modeling in Snowflake

Let's practice!

Introduction to Data Modeling in Snowflake

Preparing Video For Download...