dbt seeds

Introduction to dbt

Mike Metzger

Data Engineer

What are dbt seeds?

  • CSV files to be loaded into data warehouse
  • Typically rarely changing sets of data
    • List of countries
    • List of postal codes
  • NOT meant for raw data

Seeds

1 Photo by Maddi Bazzocco on Unsplash
Introduction to dbt

Why?

  • Easy to manage
  • Easy to use in various scenarios
  • Source controllable
Introduction to dbt

How are seeds defined?

  • Add CSV file to the seeds directory
  • Make sure the header is the first row
  • Import using the dbt seed command
zipcode,place,state
99553,Akutan,Alaska
99571,Cold Bay,Alaska
99583,False Pass,Alaska

bash> dbt seed
1 Postal codes provided via https://github.com/zauberware/postal-codes-json-xml-csv
Introduction to dbt

Further configuration

  • Several options
    • Which schema?
    • What database?
    • Column quoting
    • Column data types
  • Can be applied to the whole project or to individual seeds
  • Can be added to dbt_project.yml or seeds/properties.yml
Introduction to dbt

Defining datatypes

  • Available data types depend on data warehouse
  • Typical ones are available
    • Integer
    • Varchar
    • etc
  • If type is not defined, it is inferred based on the data
version: 2

seeds:
  - name: zipcodes
    config:
      column_types:
        zipcode: varchar(5)
Introduction to dbt

Tests & documentation

  • Support tests
  • Support documentation
  • Just like models and sources
version: 2

seeds:
  - name: zipcodes
    description: US zipcodes
    config:
      column_types:
        zipcode: varchar(5)
    columns:
      - name: zipcode
        tests:
          - unique
Introduction to dbt

Accessing seeds

  • Available via the {{ ref() }} command
  • Behaves as a model after initial import
select * from
  {{ ref('zipcodes') }}
Introduction to dbt

Let's practice!

Introduction to dbt

Preparing Video For Download...