Intermediate Aggregate Functions

Data Pipeline Automation in Snowflake

Emily Melhuish

Technical Curriculum Developer, Snowflake

Order of Operations

Order of operations.png

SQL doesn't run in the order you write it

  1. Rows: build the dataset
  2. Groups: summarise it
  3. Result: shape the output
1 * Snowflake Learning Material
Data Pipeline Automation in Snowflake

Common Aggregate Functions

  • COUNT(*) / COUNT(DISTINCT col) — total rows; unique values
  • SUM — column total
  • AVG — mean value
  • MIN / MAX — range
SELECT COUNT(*) AS total_shipments,
       COUNT(DISTINCT carrier_id) AS unique_carriers,
       SUM(delivery_days) AS total_days,
       AVG(delivery_days) AS avg_days
FROM shipments;
Data Pipeline Automation in Snowflake

The Problem with Multiple Queries

Return usage by region and warehouse

SELECT 
region
, warehouse_name
, SUM(credits_used) 
FROM usage 
GROUP BY region, warehouse_name; 

Return usage by region sub-totals

SELECT region
, SUM(credits_used) 
FROM usage GROUP BY region; 

Return usage total

SELECT 
SUM(credits_used) 
FROM usage;
Data Pipeline Automation in Snowflake

GROUPING SETS

Input

SELECT region
, warehouse_name
, SUM(credits_used) AS total 
FROM usage 
GROUP BY GROUPING SETS ( 
  (region, warehouse_name)
  , (region)
  , () 
);

Output

REGION WAREHOUSE_NAME TOTAL
EMEA harbr_wh_prod 1240.50
EMEA harbr_wh_dev 380.00
EMEA NULL 1620.50
APAC harbr_wh_prod 890.25
APAC NULL 890.25
NULL NULL 2510.75
  • Note: Add meaning to NULLs through CASE statement
Data Pipeline Automation in Snowflake

ROLLUP

Input

SELECT 
region
, warehouse_name
, SUM(credits_used) AS total 
FROM usage 
GROUP BY ROLLUP (region, warehouse_name);

Output

REGION WAREHOUSE_NAME TOTAL
EMEA harbr_wh_prod 1240.50
EMEA harbr_wh_dev 380.00
EMEA NULL 1620.50
APAC harbr_wh_prod 890.25
APAC NULL 890.25
NULL NULL 2510.75
Data Pipeline Automation in Snowflake

CUBE

Input

SELECT 
region
, warehouse_name
, SUM(credits_used) AS total
FROM usage 
GROUP BY CUBE (region, warehouse_name);

Output

REGION WAREHOUSE_NAME TOTAL
EMEA harbr_wh_prod 1240.50
EMEA harbr_wh_dev 380.00
EMEA NULL 1620.50
APAC harbr_wh_prod 890.25
APAC NULL 890.25 }
NULL harbr_wh_prod 2130.75
NULL harbr_wh_dev 380.00
NULL NULL 2510.75
Data Pipeline Automation in Snowflake

GROUPING() and GROUPING_ID()

SELECT region, warehouse_name, SUM(credits_used) AS total_credits, GROUPING(region) AS is_region_agg, GROUPING(warehouse_name) AS is_wh_agg , 
GROUPING_ID(region, warehouse_name) AS grouping_level 
FROM usage 
GROUP BY ROLLUP (region, warehouse_name);

Screenshot of markdown table output

Data Pipeline Automation in Snowflake

Let's practice!

Data Pipeline Automation in Snowflake

Preparing Video For Download...