Setting up the dbt project and loading data

Case Study: Building E-Commerce Data Models with dbt

Susan Sun

Freelance Data Scientist

Review: dbt set up and initialization

Install dbt

pip install dbt

Initialize dbt project looker_ecommerce

dbt init looker_ecommerce

Verify set up success

cd looker_ecommerce
dbt debug

dbt auto-generated file directory:

A repeat of the screen shot from previous lesson. It includes a series of file directories auto-generated by dbt init. Top level folder is called looker e-commerce.  Children folders are named analyses, macros, models, seeds, snapshots, and tests. There are also three miscellaneous files gitignore, dbt project yaml, and a readme markdown.

Case Study: Building E-Commerce Data Models with dbt

Getting familiar with the data: distribution centers

  • distribution_centers.csv is small and static, with only 10 rows

  • A sample of the raw distribution_centers.csv data file

A preview of the distribution center data file containing three rows of data and 4 rows. Column names are id, name, latitude, and longitude.

  • id: Unique identifier for each distribution center
  • name: Name of the distribution center
  • latitude: Latitude coordinate of the distribution center
  • longitude: Longitude coordinate of the distribution center
Case Study: Building E-Commerce Data Models with dbt

Getting familiar with the data: orders

orders.csv is large and constantly updating, with 125,000 rows and 9 columns

  • order_id: Unique ID for each order item
  • user_id: ID of the user who placed an order
  • status: Status of the order
  • gender: Gender of the user
  • created_at: Order created at timestamp
  • returned_at: Order returned at timestamp
  • shipped_at: Order shipped at timestamp
  • delivered_at: Order delivered at timestamp
  • num_of_items: Number of items in each order
Case Study: Building E-Commerce Data Models with dbt

Getting familiar with the data: orders

A sample of the raw orders.csv data file:

A preview of the orders data file containing two rows of data and 9 rows. Column names are order id, user id, status, gender, created at, returned at, shipped at, delivered at, and num of items.

Case Study: Building E-Commerce Data Models with dbt

Setting up raw source and seed data sources

Distribution center raw data file:

  • Data characteristics:
    • Small, flat file (csv)
    • Slow changing data

Orders raw data file:

  • Data characteristics:
    • Large dataset
    • Fast changing data
  • dbt load method:
    • dbt seed
    • Load as a one-time file
  • dbt load method:
    • dbt source
    • Connector to DuckDB
Case Study: Building E-Commerce Data Models with dbt

Setting up raw sources and seed data sources

dbt init auto-generates the basic file structure:

looker_ecommerce/
  macros/
  models/
  seeds/
  snapshots/
  tests/
  dbt_project.yml
Case Study: Building E-Commerce Data Models with dbt

Setting up raw source and seed data sources

Load distribution_center as seed:

looker_ecommerce/
  macros/
  models/
    stg_looker__distribution_centers.sql
  seeds/
    looker__distribution_centers.csv
  snapshots/
  tests/
  dbt_project.yml

In stg_looker__distribution_centers.sql:

SELECT 
  id,
  name,
  latitude,
  longitude
FROM 
 {{ref('looker__distribution_centers')}}
Case Study: Building E-Commerce Data Models with dbt

Setting up raw source and seed data sources

Load orders as source:

looker_ecommerce/
  macros/
  models/
    stg_looker__orders.sql
  seeds/
  snapshots/
  tests/
  dbt_project.yml

In stg_looker__orders.sql:

SELECT *
FROM 
{{source('looker_ecommerce', 'orders')}}
Case Study: Building E-Commerce Data Models with dbt

Documenting sources and staging models

To document sources:

looker_ecommerce/
  macros/
  models/
    _looker__sources.yml
  seeds/
  snapshots/
  tests/
  dbt_project.yml

In _looker__sources.yml:

version: 2

sources:
  - name: looker_ecommerce
    tables:
      - name: orders
Case Study: Building E-Commerce Data Models with dbt

Documenting sources and staging model

  • Sample _looker__models.yml file under the same models directory:
version: 2

models:
  - name: stg_looker__distribution_centers
    description: Distribution center name and location    

  - name: stg_looker__orders
    description: Order information such as order status
Case Study: Building E-Commerce Data Models with dbt

Sources, seeds, models, and yaml

looker_ecommerce/
  macros/
  models/
    _looker__models.yml
    _looker__sources.yml
    stg_looker__distribution_centers.sql
    stg_looker__orders.sql
  seeds/
    looker__distribution_centers.csv
  snapshots/
  tests/
  dbt_project.yml
Case Study: Building E-Commerce Data Models with dbt

Review: dbt subcommands

Loads csv files into as seed files

dbt seed

Creates or updates all models in dbt project

dbt run

Creates or updates the specified model

dbt run --select model

Runs all tests in dbt project

dbt test 

Runs tests with the specified model

dbt test --select model

Combines dbt run and dbt test in one!

dbt build
dbt build --select model
Case Study: Building E-Commerce Data Models with dbt

Review: best practice guides

models naming structure:

  • uses double underscore to separate data source name from model name
  • <data_source>__<model_name>.sql

e.g.

  • stg_looker__distribution_centers.sql
  • stg_looker__orders.sql

yaml file naming structure:

  • starts with single underscore
  • uses double underscore to separate data source name from artifact name
  • _<data_source>__<artifact_type>.yml

e.g.

  • _looker__models.yml
  • _looker__sources.yml
1 https://docs.getdbt.com/best-practices
Case Study: Building E-Commerce Data Models with dbt

Let's practice!

Case Study: Building E-Commerce Data Models with dbt

Preparing Video For Download...