SCD2 with dbt snapshots

Intermediate dbt

Mike Metzger

Data Engineer

What is a snapshot?

  • A look into the changes of a dataset over time
  • Illustrate the various states of an object, such as
    • Order status
    • Production state
    • Shipping status

Order

1 Photo by micheile henderson on Unsplash
Intermediate dbt

SCD2

  • Slowly changing dimension
  • Tracks changes over time
  • dbt implements SCD2 with snapshots

Changing data

1 Photo by Luke Chesser on Unsplash
Intermediate dbt

SCD2 example

  • Order status
  • Available states:
    • Received
    • Packed
    • Shipped
id order_status last_updated
1 Shipped 2023-07-01 11:30
id order_status last_updated
1 Received 2023-07-01 10:45
1 Packed 2023-07-01 11:15
1 Shipped 2023-07-01 11:30
Intermediate dbt

SCD2 in dbt

  • dbt uses snapshots to implement SCD2
  • Can track the changes automatically
  • Adds extra columns to the output
    • dbt_valid_from
    • dbt_valid_to
id order_status last_updated dbt_valid_from dbt_valid_to
1 Received 2023-07-01 10:45 2023-07-01 10:45 2023-07-01 11:15
1 Packed 2023-07-01 11:15 2023-07-01 11:15 2023-07-01 11:30
1 Shipped 2023-07-01 11:30 2023-07-01 11:30 null
Intermediate dbt

Implementing dbt snapshots

  • SQL file, snapshots/snapshot_name.sql
{% snapshot snapshot_orders %}

{{ config(
target_schema='snapshots',
strategy='timestamp',
unique_key='id',
updated_at='last_updated'
) }}
select * from {{ source('raw', 'orders') }}
{% endsnapshot %}
Intermediate dbt

dbt snapshot

  • Run dbt snapshot
  • Create new model using the ref() command to query snapshot
    • select * from {{ ref('snapshot_orders') }}
  • Run dbt snapshot frequently to see potentially changed data
    • Schedule for automatic updates
Intermediate dbt

Let's practice!

Intermediate dbt

Preparing Video For Download...