Special aggregations in BigQuery

Introduction to BigQuery

Matthew Forrest

Field CTO

Introduction to special aggregates

Function Category Description
APPROX_COUNT_DISTINCT, APPROX_QUANTILES, APPROX_TOP_COUNT, APPROX_TOP_SUM Approximate Aggregation Provide estimates for certain calculations, reducing processing time and resource consumption.
ARRAY_CONCAT_AGG, STRING_AGG Array and String Manipulation Collect, concatenate, and manipulate arrays and strings.
LOGICAL_AND, LOGICAL_OR Logical Operations Evaluate logical AND and OR operations on a set of boolean expressions.
Introduction to BigQuery

ARRAY_CONCAT_AGG

  • ARRAY_CONCAT_AGG() addresses the limitations of ARRAY_AGG()
SELECT
  order_id,
  ARRAY_CONCAT_AGG(order_items) AS all_items
FROM sales_data
GROUP BY order_id;
| order_id | all_items                          |
|----------|------------------------------------|
| 1        | [shoes, electronics]               |
| 2        | [electronics, household, clothing] |
Introduction to BigQuery

STRING_AGG

  • STRING_AGG() concatenates strings into a single string
SELECT STRING_AGG(customer_id, ', ') AS all_customers
FROM sales_data
WHERE delivery_date 
BETWEEN CURRENT_TIMESTAMP() 
AND TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 3 DAY) ;
| all_customers   |
|-----------------|
| 123, 456, 789   |
Introduction to BigQuery

APPROX_COUNT_DISTINCT

  • APPROX_COUNT_DISTINCT() gives quick estimation of the count of distinct values
SELECT
  customer_id,
  APPROX_COUNT_DISTINCT(order_id) AS unique_orders
FROM sales_data
GROUP BY customer_id;
| customer_id | unique_orders |
|-------------|---------------|
| 1           | 5             |
| 2           | 2             |
Introduction to BigQuery

APPROX_QUANTILES

  • APPROX_QUANTILES() gives approximate quantile values
SELECT
  category,
  APPROX_QUANTILES(value, 4) AS quartiles
FROM sales_data
GROUP BY category;
| category    | quartiles         |
|-------------|-------------------|
| shoes       | [25, 40, 55, 100] |
| electronics | [10, 40, 95, 300] |
Introduction to BigQuery

APPROX_TOP_COUNT

  • APPROX_TOP_COUNT() identifies the top K elements based on their occurrence
SELECT
  category,
  APPROX_TOP_COUNT(customer_id, 3) AS customers
FROM sales_data
GROUP BY category;
| category    | customers   |
|-------------|-------------|
| shoes       | [1, 7, 19]  |
| electronics | [8, 19, 22] |
Introduction to BigQuery

APPROX_TOP_SUM

  • APPROX_TOP_SUM(el, weight, K) finds the top K elements el based on weight
SELECT
  seller_id,
  APPROX_TOP_SUM(item_id, cost, 3) AS top_items
FROM sales_data
GROUP BY seller_id;
| seller_id | top_items                                    |
|-----------|----------------------------------------------|
| 1         | [[1024, 5000], [1567, 3000], [3489, 2000]]   |
| 2         | [[5647, 6500], [9867, 3500], [1074, 2100]]   |
Introduction to BigQuery

LOGICAL_AND and LOGICAL_OR

SELECT
  customer_id,
  -- true if ALL are true
  LOGICAL_AND(order_status = 'shipped') AS all_shipped,
  -- true if at least one is true
  LOGICAL_OR(order_status = 'shipped') AS one_shipped
FROM sales_data
GROUP BY customer_id;
| customer_id | all_shipped | one_shipped |
|-------------|-------------|-------------|
| 1           | false       | true        |
| 2           | true        | true        |
Introduction to BigQuery

Let's practice!

Introduction to BigQuery

Preparing Video For Download...