Speciale aggregaties in BigQuery

Introductie tot BigQuery

Matthew Forrest

Field CTO

Introductie tot speciale aggregaten

Function Category Description
APPROX_COUNT_DISTINCT, APPROX_QUANTILES, APPROX_TOP_COUNT, APPROX_TOP_SUM Benaderende aggregatie Leveren schattingen om rekentijd en resources te besparen.
ARRAY_CONCAT_AGG, STRING_AGG Array- en tekenreeksbewerking Verzamelen, samenvoegen en bewerken van arrays en strings.
LOGICAL_AND, LOGICAL_OR Logische bewerkingen Evalueren AND- en OR-bewerkingen op een set booleans.
Introductie tot BigQuery

ARRAY_CONCAT_AGG

  • ARRAY_CONCAT_AGG() verhelpt beperkingen van ARRAY_AGG()
SELECT
  order_id,
  ARRAY_CONCAT_AGG(order_items) AS all_items
FROM sales_data
GROUP BY order_id;
| order_id | all_items                          |
|----------|------------------------------------|
| 1        | [shoes, electronics]               |
| 2        | [electronics, household, clothing] |
Introductie tot BigQuery

STRING_AGG

  • STRING_AGG() voegt strings samen tot één string
SELECT STRING_AGG(customer_id, ', ') AS all_customers
FROM sales_data
WHERE delivery_date 
BETWEEN CURRENT_TIMESTAMP() 
AND TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 3 DAY) ;
| all_customers   |
|-----------------|
| 123, 456, 789   |
Introductie tot BigQuery

APPROX_COUNT_DISTINCT

  • APPROX_COUNT_DISTINCT() geeft snel een schatting van het aantal unieke waarden
SELECT
  customer_id,
  APPROX_COUNT_DISTINCT(order_id) AS unique_orders
FROM sales_data
GROUP BY customer_id;
| customer_id | unique_orders |
|-------------|---------------|
| 1           | 5             |
| 2           | 2             |
Introductie tot BigQuery

APPROX_QUANTILES

  • APPROX_QUANTILES() geeft benaderende kwantielen
SELECT
  category,
  APPROX_QUANTILES(value, 4) AS quartiles
FROM sales_data
GROUP BY category;
| category    | quartiles         |
|-------------|-------------------|
| shoes       | [25, 40, 55, 100] |
| electronics | [10, 40, 95, 300] |
Introductie tot BigQuery

APPROX_TOP_COUNT

  • APPROX_TOP_COUNT() vindt de top K elementen op basis van hun voorkomen
SELECT
  category,
  APPROX_TOP_COUNT(customer_id, 3) AS customers
FROM sales_data
GROUP BY category;
| category    | customers   |
|-------------|-------------|
| shoes       | [1, 7, 19]  |
| electronics | [8, 19, 22] |
Introductie tot BigQuery

APPROX_TOP_SUM

  • APPROX_TOP_SUM(el, weight, K) vindt de top K elementen el op basis van weight
SELECT
  seller_id,
  APPROX_TOP_SUM(item_id, cost, 3) AS top_items
FROM sales_data
GROUP BY seller_id;
| seller_id | top_items                                    |
|-----------|----------------------------------------------|
| 1         | [[1024, 5000], [1567, 3000], [3489, 2000]]   |
| 2         | [[5647, 6500], [9867, 3500], [1074, 2100]]   |
Introductie tot BigQuery

LOGICAL_AND en LOGICAL_OR

SELECT
  customer_id,
  -- true if ALL are true
  LOGICAL_AND(order_status = 'shipped') AS all_shipped,
  -- true if at least one is true
  LOGICAL_OR(order_status = 'shipped') AS one_shipped
FROM sales_data
GROUP BY customer_id;
| customer_id | all_shipped | one_shipped |
|-------------|-------------|-------------|
| 1           | false       | true        |
| 2           | true        | true        |
Introductie tot BigQuery

Laten we oefenen!

Introductie tot BigQuery

Preparing Video For Download...