Introduction to BigQuery
Matthew Forrest
Field CTO
Function | Category | Description |
---|---|---|
APPROX_COUNT_DISTINCT , APPROX_QUANTILES , APPROX_TOP_COUNT , APPROX_TOP_SUM |
Approximate Aggregation | Provide estimates for certain calculations, reducing processing time and resource consumption. |
ARRAY_CONCAT_AGG , STRING_AGG |
Array and String Manipulation | Collect, concatenate, and manipulate arrays and strings. |
LOGICAL_AND , LOGICAL_OR |
Logical Operations | Evaluate logical AND and OR operations on a set of boolean expressions. |
ARRAY_CONCAT_AGG()
addresses the limitations of ARRAY_AGG()
SELECT
order_id,
ARRAY_CONCAT_AGG(order_items) AS all_items
FROM sales_data
GROUP BY order_id;
| order_id | all_items |
|----------|------------------------------------|
| 1 | [shoes, electronics] |
| 2 | [electronics, household, clothing] |
STRING_AGG()
concatenates strings into a single stringSELECT STRING_AGG(customer_id, ', ') AS all_customers
FROM sales_data
WHERE delivery_date
BETWEEN CURRENT_TIMESTAMP()
AND TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 3 DAY) ;
| all_customers |
|-----------------|
| 123, 456, 789 |
APPROX_COUNT_DISTINCT()
gives quick estimation of the count of distinct valuesSELECT
customer_id,
APPROX_COUNT_DISTINCT(order_id) AS unique_orders
FROM sales_data
GROUP BY customer_id;
| customer_id | unique_orders |
|-------------|---------------|
| 1 | 5 |
| 2 | 2 |
APPROX_QUANTILES()
gives approximate quantile valuesSELECT
category,
APPROX_QUANTILES(value, 4) AS quartiles
FROM sales_data
GROUP BY category;
| category | quartiles |
|-------------|-------------------|
| shoes | [25, 40, 55, 100] |
| electronics | [10, 40, 95, 300] |
APPROX_TOP_COUNT()
identifies the top K elements based on their occurrenceSELECT
category,
APPROX_TOP_COUNT(customer_id, 3) AS customers
FROM sales_data
GROUP BY category;
| category | customers |
|-------------|-------------|
| shoes | [1, 7, 19] |
| electronics | [8, 19, 22] |
APPROX_TOP_SUM(el, weight, K)
finds the top K
elements el
based on weight
SELECT
seller_id,
APPROX_TOP_SUM(item_id, cost, 3) AS top_items
FROM sales_data
GROUP BY seller_id;
| seller_id | top_items |
|-----------|----------------------------------------------|
| 1 | [[1024, 5000], [1567, 3000], [3489, 2000]] |
| 2 | [[5647, 6500], [9867, 3500], [1074, 2100]] |
SELECT
customer_id,
-- true if ALL are true
LOGICAL_AND(order_status = 'shipped') AS all_shipped,
-- true if at least one is true
LOGICAL_OR(order_status = 'shipped') AS one_shipped
FROM sales_data
GROUP BY customer_id;
| customer_id | all_shipped | one_shipped |
|-------------|-------------|-------------|
| 1 | false | true |
| 2 | true | true |
Introduction to BigQuery