Window Functions

Data Pipeline Automation in Snowflake

Emily Melhuish

Technical Curriculum Developer, Snowflake

What is a Window Function?

Calculate across rows without collapsing them

Computes a result per row using a surrounding set — its window
Unlike GROUP BY: every row in, every row out
Defined by the OVER clause

3 main categories:

Ranking: ROW_NUMBER(), RANK(), DENSE_RANK()
Aggregation: SUM(), AVG(), COUNT() with OVER()
Navigation: LAG(), LEAD(), FIRST_VALUE(), LAST_VALUE()

More info in DataCamp's Window Functions in Snowflake course

The OVER() Clause

SELECT shipment_id
, region
, delivery_days
, AVG(delivery_days) 
    OVER ( PARTITION BY region ORDER BY dispatched_at ) AS avg_days_in_region 
FROM shipments;

SHIPMENT_ID	REGION	DELIVERY_DAYS	RUNNING_AVG_DAYS_IN_REGION
SHP-001	EMEA	3	3.00
SHP-002	EMEA	5	4.00
SHP-003	EMEA	4	4.00
SHP-004	APAC	6	6.00

ROW_NUMBER vs RANK vs DENSE_RANK

SELECT
    shipment_id AS shipment, delivery_days AS days,
    ROW_NUMBER() OVER (ORDER BY delivery_days) AS row_number,
    RANK()       OVER (ORDER BY delivery_days) AS rank,
    DENSE_RANK() OVER (ORDER BY delivery_days) AS dense_rank
FROM shipments ORDER BY delivery_days;

SHIPMENT	DELIVERY_DAYS	ROW_NUMBER	RANK	DENSE_RANK
SHP-001	3	1	1	1
SHP-002	5	2	2	2
SHP-003	5	3	2	2
SHP-004	7	4	4	3

LAG and LEAD

Input

SQL Lag and Lead Input

Output

Lag and Lead Output

Running Totals and Rolling Averages

ROWS vs RANGE frames

ROWS BETWEEN: counts physical row positions - exact offsets regardless of value

-- ROWS: always picks exactly N preceding physical rows
SUM(credits) OVER (ORDER BY month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
-- Includes current row and two preceding

RANGE BETWEEN: uses values, groups rows with ORDER BY values together

-- RANGE: includes all rows tied at the boundary value
SUM(credits) OVER (ORDER BY month RANGE BETWEEN 2 PRECEDING AND CURRENT ROW)

Reach only for Range when your logic is about value boundaries

Let's practice!

Data Pipeline Automation in Snowflake