Introductie tot BigQuery
Matthew Forrest
Field CTO
| Function | Category | Description |
|---|---|---|
APPROX_COUNT_DISTINCT, APPROX_QUANTILES, APPROX_TOP_COUNT, APPROX_TOP_SUM |
Benaderende aggregatie | Leveren schattingen om rekentijd en resources te besparen. |
ARRAY_CONCAT_AGG, STRING_AGG |
Array- en tekenreeksbewerking | Verzamelen, samenvoegen en bewerken van arrays en strings. |
LOGICAL_AND, LOGICAL_OR |
Logische bewerkingen | Evalueren AND- en OR-bewerkingen op een set booleans. |
ARRAY_CONCAT_AGG() verhelpt beperkingen van ARRAY_AGG()SELECT
order_id,
ARRAY_CONCAT_AGG(order_items) AS all_items
FROM sales_data
GROUP BY order_id;
| order_id | all_items |
|----------|------------------------------------|
| 1 | [shoes, electronics] |
| 2 | [electronics, household, clothing] |
STRING_AGG() voegt strings samen tot één stringSELECT STRING_AGG(customer_id, ', ') AS all_customers
FROM sales_data
WHERE delivery_date
BETWEEN CURRENT_TIMESTAMP()
AND TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 3 DAY) ;
| all_customers |
|-----------------|
| 123, 456, 789 |
APPROX_COUNT_DISTINCT() geeft snel een schatting van het aantal unieke waardenSELECT
customer_id,
APPROX_COUNT_DISTINCT(order_id) AS unique_orders
FROM sales_data
GROUP BY customer_id;
| customer_id | unique_orders |
|-------------|---------------|
| 1 | 5 |
| 2 | 2 |
APPROX_QUANTILES() geeft benaderende kwantielenSELECT
category,
APPROX_QUANTILES(value, 4) AS quartiles
FROM sales_data
GROUP BY category;
| category | quartiles |
|-------------|-------------------|
| shoes | [25, 40, 55, 100] |
| electronics | [10, 40, 95, 300] |
APPROX_TOP_COUNT() vindt de top K elementen op basis van hun voorkomenSELECT
category,
APPROX_TOP_COUNT(customer_id, 3) AS customers
FROM sales_data
GROUP BY category;
| category | customers |
|-------------|-------------|
| shoes | [1, 7, 19] |
| electronics | [8, 19, 22] |
APPROX_TOP_SUM(el, weight, K) vindt de top K elementen el op basis van weightSELECT
seller_id,
APPROX_TOP_SUM(item_id, cost, 3) AS top_items
FROM sales_data
GROUP BY seller_id;
| seller_id | top_items |
|-----------|----------------------------------------------|
| 1 | [[1024, 5000], [1567, 3000], [3489, 2000]] |
| 2 | [[5647, 6500], [9867, 3500], [1074, 2100]] |
SELECT
customer_id,
-- true if ALL are true
LOGICAL_AND(order_status = 'shipped') AS all_shipped,
-- true if at least one is true
LOGICAL_OR(order_status = 'shipped') AS one_shipped
FROM sales_data
GROUP BY customer_id;
| customer_id | all_shipped | one_shipped |
|-------------|-------------|-------------|
| 1 | false | true |
| 2 | true | true |
Introductie tot BigQuery