Tasks and DAG Orchestration

Data Pipeline Automation in Snowflake

Emily Melhuish

Technical Curriculum Developer, Snowflake

Solution For Manual Workloads: Tasks

  • Team runs a SQL script manually each morning
  • One missed run - business loses visibility
  • Snowflake tasks automate this entirely

Example:

-- Today's manual routine (fragile)
CALL logistics.refresh_ops_dashboard();
Data Pipeline Automation in Snowflake

What Do We Need To Know About Tasks

  • Tasks run SQL on a schedule — no external CRON required
  • Execute SQL statements or call stored procedures
  • Compute and scheduling live natively in Snowflake
CREATE TASK logistics.refresh_dashboard
  WAREHOUSE = harbr_wh
  SCHEDULE = 'USING CRON 0 6 * * * UTC'
AS CALL logistics.refresh_ops_dashboard();
Data Pipeline Automation in Snowflake

Creating a Standalone Task

CREATE OR REPLACE TASK transform_delivery_summary 
  WAREHOUSE = harbr_wh 
  SCHEDULE = 'USING CRON 5 * * * * UTC' 
  AS INSERT INTO delivery_summary 

SELECT shipment_id, status, updated_at 
FROM delivery_events 
WHERE processed = FALSE; 
Data Pipeline Automation in Snowflake

Warehouse-based vs Serverless

Warehouse-based

  • Uses a named virtual warehouse
  • 60-second minimum billing per run
CREATE TASK my_task
  WAREHOUSE = harbr_wh  -- named warehouse
  SCHEDULE = '5 MINUTE'
AS INSERT INTO ...;

Serverless

  • Omit the WAREHOUSE clause
  • Billed per second consumed; no idle cost
CREATE TASK my_serverless_task
  -- no WAREHOUSE clause
  SCHEDULE = '5 MINUTE'
AS INSERT INTO ...;
Data Pipeline Automation in Snowflake

DAG-based Task Orchestration

DAG diagram — ingest_raw (root task, clock icon) → clean_events (runs after ingest_raw) → build_summary (runs after clean_events)

  • DAGs: Directed Acyclic Graph
  • Root task holds the CRON schedule
  • Child tasks declare their predecessor with AFTER
  • Entire chain is centralized
  • DAGs control when and in what order things run
Data Pipeline Automation in Snowflake

Managing Task States

State diagram — SUSPENDED → STARTED → SUCCEEDED or FAILED

-- Activate a task
ALTER TASK mytask RESUME; 
-- Pause a task
ALTER TASK mytask SUSPEND; 
-- Execute a task
EXECUTE TASK mytask SUSPEND;
Data Pipeline Automation in Snowflake

Tasks and Streams Combined

CDC pipeline diagram — delivery_events → stream captures changes → stream has data? YES → task fires → delivery_summary updated / NO → task skips

CREATE OR REPLACE TASK logistics.process_delivery_events
    WAREHOUSE = compute_wh
    SCHEDULE = '5 minute'
    WHEN SYSTEM$STREAM_HAS_DATA('logistics.delivery_events_stream')
AS
INSERT INTO logistics.processed_events
SELECT * FROM logistics.delivery_events_stream;
Data Pipeline Automation in Snowflake

Let's practice!

Data Pipeline Automation in Snowflake

Preparing Video For Download...