Data Pipeline Automation in Snowflake
Emily Melhuish
Technical Curriculum Developer, Snowflake
Calculate across rows without collapsing them
GROUP BY: every row in, every row outOVER clause3 main categories:
ROW_NUMBER(), RANK(), DENSE_RANK()SUM(), AVG(), COUNT() with OVER()LAG(), LEAD(), FIRST_VALUE(), LAST_VALUE()More info in DataCamp's Window Functions in Snowflake course
SELECT shipment_id
, region
, delivery_days
, AVG(delivery_days)
OVER ( PARTITION BY region ORDER BY dispatched_at ) AS avg_days_in_region
FROM shipments;
| SHIPMENT_ID | REGION | DELIVERY_DAYS | RUNNING_AVG_DAYS_IN_REGION |
|---|---|---|---|
| SHP-001 | EMEA | 3 | 3.00 |
| SHP-002 | EMEA | 5 | 4.00 |
| SHP-003 | EMEA | 4 | 4.00 |
| SHP-004 | APAC | 6 | 6.00 |
SELECT
shipment_id AS shipment, delivery_days AS days,
ROW_NUMBER() OVER (ORDER BY delivery_days) AS row_number,
RANK() OVER (ORDER BY delivery_days) AS rank,
DENSE_RANK() OVER (ORDER BY delivery_days) AS dense_rank
FROM shipments ORDER BY delivery_days;
| SHIPMENT | DELIVERY_DAYS | ROW_NUMBER | RANK | DENSE_RANK |
|---|---|---|---|---|
| SHP-001 | 3 | 1 | 1 | 1 |
| SHP-002 | 5 | 2 | 2 | 2 |
| SHP-003 | 5 | 3 | 2 | 2 |
| SHP-004 | 7 | 4 | 4 | 3 |
Input

Output


ROWS BETWEEN: counts physical row positions - exact offsets regardless of value-- ROWS: always picks exactly N preceding physical rows
SUM(credits) OVER (ORDER BY month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
-- Includes current row and two preceding
RANGE BETWEEN: uses values, groups rows with ORDER BY values together-- RANGE: includes all rows tied at the boundary value
SUM(credits) OVER (ORDER BY month RANGE BETWEEN 2 PRECEDING AND CURRENT ROW)
Reach only for Range when your logic is about value boundaries
Data Pipeline Automation in Snowflake