Window Functions

Data Pipeline Automation in Snowflake

Emily Melhuish

Technical Curriculum Developer, Snowflake

What is a Window Function?

Calculate across rows without collapsing them

  • Computes a result per row using a surrounding set — its window
  • Unlike GROUP BY: every row in, every row out
  • Defined by the OVER clause

3 main categories:

  • Ranking: ROW_NUMBER(), RANK(), DENSE_RANK()
  • Aggregation: SUM(), AVG(), COUNT() with OVER()
  • Navigation: LAG(), LEAD(), FIRST_VALUE(), LAST_VALUE()

More info in DataCamp's Window Functions in Snowflake course

Data Pipeline Automation in Snowflake

The OVER() Clause

SELECT shipment_id
, region
, delivery_days
, AVG(delivery_days) 
    OVER ( PARTITION BY region ORDER BY dispatched_at ) AS avg_days_in_region 
FROM shipments;
SHIPMENT_ID REGION DELIVERY_DAYS RUNNING_AVG_DAYS_IN_REGION
SHP-001 EMEA 3 3.00
SHP-002 EMEA 5 4.00
SHP-003 EMEA 4 4.00
SHP-004 APAC 6 6.00
Data Pipeline Automation in Snowflake

ROW_NUMBER vs RANK vs DENSE_RANK

SELECT
    shipment_id AS shipment, delivery_days AS days,
    ROW_NUMBER() OVER (ORDER BY delivery_days) AS row_number,
    RANK()       OVER (ORDER BY delivery_days) AS rank,
    DENSE_RANK() OVER (ORDER BY delivery_days) AS dense_rank
FROM shipments ORDER BY delivery_days;
SHIPMENT DELIVERY_DAYS ROW_NUMBER RANK DENSE_RANK
SHP-001 3 1 1 1
SHP-002 5 2 2 2
SHP-003 5 3 2 2
SHP-004 7 4 4 3
Data Pipeline Automation in Snowflake

LAG and LEAD

Input

SQL Lag and Lead Input

Output

Lag and Lead Output

Data Pipeline Automation in Snowflake

Running Totals and Rolling Averages

window_frames.png

Data Pipeline Automation in Snowflake

ROWS vs RANGE frames

  • ROWS BETWEEN: counts physical row positions - exact offsets regardless of value
-- ROWS: always picks exactly N preceding physical rows
SUM(credits) OVER (ORDER BY month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
-- Includes current row and two preceding
  • RANGE BETWEEN: uses values, groups rows with ORDER BY values together
-- RANGE: includes all rows tied at the boundary value
SUM(credits) OVER (ORDER BY month RANGE BETWEEN 2 PRECEDING AND CURRENT ROW)

Reach only for Range when your logic is about value boundaries

Data Pipeline Automation in Snowflake

Let's practice!

Data Pipeline Automation in Snowflake

Preparing Video For Download...