WINDOW Functions

Introduction to BigQuery

Matthew Forrest

Field CTO

What are WINDOW functions

An illustration of a WINDOW with a rolling SUM

Introduction to BigQuery

When to use WINDOW functions

WINDOW functions grouped by type

1 https://towardsdatascience.com/a-guide-to-advanced-sql-window-functions-f63f2642cbf9
Introduction to BigQuery

WINDOW structure, PARTITION and ORDER BY

SELECT
  customer_id,
  order_date,
  order_total,
  ROW_NUMBER() OVER(

PARTITION BY customer_id
ORDER BY order_date
) AS order_sequence FROM orders;
  • ROW_NUMBER(): The window function that returns the row number
  • OVER(): Defines the window frame
  • PARTITION BY customer_id: Groups the data by customer
  • ORDER BY order_date: Orders data within each partition
  • order_sequence: Result of the window function operation
Introduction to BigQuery

RANK and PERCENT_RANK

SELECT
  product_id,
  product_photos_qty,
  -- Ordinal rank for each row
  RANK() OVER(
    ORDER BY product_photos_qty DESC
  ) as rank,
  -- Percentile rank for each row
  PERCENT_RANK() OVER(
    ORDER BY product_photos_qty
  ) as percent
FROM ecommerce.ecomm_products 
ORDER BY product_photos_qty DESC;

Results of the RANK and PERCENT_RANK query

Introduction to BigQuery

LAG and LEAD

SELECT
  product_id,
  -- Returns value from previous row
  LAG(product_photos_qty) OVER(
    ORDER BY product_photos_qty
  ) as lag,
  product_photos_qty,
  -- Returns value from next row
  LEAD(product_photos_qty) OVER(
    ORDER BY product_photos_qty
  ) as lead
FROM ecommerce.ecomm_products 
ORDER BY product_photos_qty DESC;

Image showing LAG and LEAD in query results

Introduction to BigQuery

RANGE BETWEEN and CURRENT ROW

SELECT
  order_id,
  order_timestamp,
  SUM(cost) OVER(
    ORDER BY order_timestamp 
    ROWS BETWEEN 2 PRECEDING 
    AND CURRENT ROW) as rolling_avg
FROM sales_data
ORDER BY order_timestamp

Row-based bounding options:

  • UNBOUNDED PRECEDING: All rows before
  • UNBOUNDED FOLLOWING: All rows after
  • [INT] ROWS PRECEDING: Specific number of rows before
  • [INT] ROWS FOLLOWING: Specific number of rows after
Introduction to BigQuery

QUALIFY

SELECT
  product_id,
  product_photos_qty,
  RANK() OVER(
    ORDER BY product_photos_qty DESC
  ) as rank
FROM ecommerce.ecomm_products 
-- Filter using QUALIFY
QUALIFY rank < 4
ORDER BY product_photos_qty DESC;

Results of our query to use QUALIFY to find values with a rank fo 3 or higher

  • Can't use HAVING as we are not aggregating
Introduction to BigQuery

Let's practice!

Introduction to BigQuery

Preparing Video For Download...