Data manipulation language (DML) statements

Introduction to BigQuery

Matt Forrest

Field CTO

Overview of data manipulation in BigQuery

  • INSERT: Add new rows of data.
  • UPDATE: Modify existing values in a row.
  • DELETE: Remove unwanted data from a tables.
  • MERGE: Statement that can combine INSERT, UPDATE, and DELETE statements into one statement.
  • CREATE TABLE AS: Creates a new table from a query result.
Introduction to BigQuery

Considerations and performance

  • Group DML statements together when possible rather than running them individually
  • You must use a WHERE condition when running an UPDATE statement
  • Consider using table partitions and clusters
1 https://cloud.google.com/bigquery/docs/reference/standard-sql/data-manipulation-language
Introduction to BigQuery

INSERT

  • Add records to tables
-- Define the columns in the parentheses 
INSERT INTO customers (customer_id, name, email)

-- Each value is a row to be inserted
VALUES (1, "John Doe", "[email protected]"),
(2, "Jane Doe", "[email protected]"),
(3, "Alice Smith", "[email protected]");
Introduction to BigQuery

UPDATE

  • Changing data based on a condition
UPDATE customers
-- Set one column for each SET statement
SET email = "[email protected]"
-- Make sure to include where otherwise all
-- rows will be updated
WHERE customer_id = 1;
  • UPDATE together with subqueries or joins
    UPDATE customers c
    SET c.email = e.email
    FROM emails e
    WHERE c.customer_id = 1;
    
Introduction to BigQuery

DELETE

  • DELETE permanently removes records and can't be reversed
DELETE FROM customers

-- Include WHERE to ensure only specific rows are deleted
WHERE customer_id = 3;
DELETE FROM customers c
JOIN emails e USING (customer_id)
WHERE email = '[email protected]'
Introduction to BigQuery

MERGE

  • Combines INSERT, UPDATE, and DELETE in a single operation
-- Sets the customers table as the target
MERGE customers AS target

-- The source is set to new_customers USING new_customers AS source
-- Matching condition ON target.customer_id = source.customer_id
-- If the emails do not match, update the email WHEN MATCHED AND target.email != source.email THEN UPDATE SET email = source.email
-- If the match is not met, insert the record WHEN NOT MATCHED THEN INSERT (customer_id, name, email) VALUES (source.customer_id, source.name, source.email);
Introduction to BigQuery

CREATE TABLE

  • Create new tables from queries
CREATE TABLE active_customers AS
SELECT customer_id, name, email 
FROM customers
WHERE last_active_date > DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY);
Introduction to BigQuery

Let's practice!

Introduction to BigQuery

Preparing Video For Download...