Case Study: Building E-Commerce Data Models with dbt
Susan Sun
Freelance Data Scientist
Install dbt
pip install dbt
Initialize dbt project looker_ecommerce
dbt init looker_ecommerce
Verify set up success
cd looker_ecommerce
dbt debug
dbt auto-generated file directory:
distribution_centers.csv
is small and static, with only 10 rows
A sample of the raw distribution_centers.csv
data file
orders.csv
is large and constantly updating, with 125,000 rows and 9 columns
A sample of the raw orders.csv
data file:
Distribution center raw data file:
Orders raw data file:
dbt seed
dbt source
dbt init
auto-generates the basic file structure:
looker_ecommerce/
macros/
models/
seeds/
snapshots/
tests/
dbt_project.yml
Load distribution_center
as seed
:
looker_ecommerce/
macros/
models/
stg_looker__distribution_centers.sql
seeds/
looker__distribution_centers.csv
snapshots/
tests/
dbt_project.yml
In stg_looker__distribution_centers.sql
:
SELECT
id,
name,
latitude,
longitude
FROM
{{ref('looker__distribution_centers')}}
Load orders
as source
:
looker_ecommerce/
macros/
models/
stg_looker__orders.sql
seeds/
snapshots/
tests/
dbt_project.yml
In stg_looker__orders.sql
:
SELECT *
FROM
{{source('looker_ecommerce', 'orders')}}
To document sources
:
looker_ecommerce/
macros/
models/
_looker__sources.yml
seeds/
snapshots/
tests/
dbt_project.yml
In _looker__sources.yml
:
version: 2
sources:
- name: looker_ecommerce
tables:
- name: orders
_looker__models.yml
file under the same models
directory:version: 2
models:
- name: stg_looker__distribution_centers
description: Distribution center name and location
- name: stg_looker__orders
description: Order information such as order status
looker_ecommerce/
macros/
models/
_looker__models.yml
_looker__sources.yml
stg_looker__distribution_centers.sql
stg_looker__orders.sql
seeds/
looker__distribution_centers.csv
snapshots/
tests/
dbt_project.yml
Loads csv files into as seed files
dbt seed
Creates or updates all models in dbt project
dbt run
Creates or updates the specified model
dbt run --select model
Runs all tests in dbt project
dbt test
Runs tests with the specified model
dbt test --select model
Combines dbt run
and dbt test
in one!
dbt build
dbt build --select model
models naming structure:
<data_source>__<model_name>.sql
e.g.
stg_looker__distribution_centers.sql
stg_looker__orders.sql
yaml file naming structure:
_<data_source>__<artifact_type>.yml
e.g.
_looker__models.yml
_looker__sources.yml
Case Study: Building E-Commerce Data Models with dbt