Query Optimization

Introduction to Data Modeling in Snowflake

Nuno Rocha

Director of Engineering

Query execution order

Query execution order, full query

Introduction to Data Modeling in Snowflake

Query execution order (1)

Query execution order; FROM

Introduction to Data Modeling in Snowflake

Query execution order (2)

Query execution order; JOIN

Introduction to Data Modeling in Snowflake

Query execution order (3)

Query execution order; WHERE

Introduction to Data Modeling in Snowflake

Query execution order (4)

Query execution order; AGGREGATIONS

Introduction to Data Modeling in Snowflake

Query execution order (5)

Query execution order; HAVING

Introduction to Data Modeling in Snowflake

Query execution order (6)

Query execution order; SORT

Introduction to Data Modeling in Snowflake

Query execution order (7)

Query execution order; LIMIT

Introduction to Data Modeling in Snowflake

Query execution order (8)

Final query execution order

  • BEST PRACTICES:
    • Avoid using SELECT *; specify only necessary columns
    • Implement LIMIT filtering to reduce data volume
    • Use the WHERE clause early for row filtering and memory conservation
    • Employ GROUP BY with aggregations on narrowed datasets to optimize processing
Introduction to Data Modeling in Snowflake

Subqueries

Subquery representation

Introduction to Data Modeling in Snowflake

Subqueries

Hotel chain loyalty data model

Introduction to Data Modeling in Snowflake

Subqueries

  • Query all guests that have more than 1000 loyalty points
SELECT *
FROM guests
WHERE id IN (SELECT guest_id 
                  FROM loyalty_program 
                  WHERE loyalty_points > 1000);
Introduction to Data Modeling in Snowflake

Common table expressions

CTE

Introduction to Data Modeling in Snowflake

Common table expressions

  • Query the latest booking details
WITH latest_booking AS (
    SELECT guest_id, 
           MAX(checkout_date) AS latest_checkout
    FROM booking_details
    GROUP BY guest_id
)
SELECT bd.*, 
       bd.checkout_date AS latest_booking_date
FROM booking_details bd
    JOIN latest_booking lb 
        ON bd.guest_id = lb.guest_id 
        AND bd.checkout_date = lb.latest_checkout;
Introduction to Data Modeling in Snowflake

CTEs and subqueries

CTEs

  • Pros

    • Enhances readability for complex queries
    • Enables reusability in the same query
    • Improves organization of SQL queries
  • Cons

    • Can introduce performance overhead
    • Limited to the scope of a single query

Subqueries

  • Pros

    • Simple and direct for single-use cases
    • Flexible in various parts of a SQL statement
  • Cons

    • It can reduce readability with complexity
    • Potential performance issues with nested instances
Introduction to Data Modeling in Snowflake

Visualizing query execution times

Query profile page

Introduction to Data Modeling in Snowflake

Terminology and functions overview

  • Query Optimization: Fine-tuning queries to maximize efficiency and performance
  • Subquery: A smaller query inside a main query that helps focus on specific data
  • Common Table Expressions (CTEs): Temporary virtual table during a query
  • WITH .. AS: SQL command to define a CTE
  • LIMIT: SQL clause constraining the number of rows in query results
  • HAVING: SQL clause used to filter data that aggregated with functions like SUM, MAX, etc
  • WHERE: SQL clause used to filter rows before grouping, enhancing query efficiency
Introduction to Data Modeling in Snowflake

Exemplary CTE and subquery template

  • Subquery
SELECT *
FROM table_name
WHERE column_name IN (SELECT column_name 
                  FROM table_name
                  WHERE column_name condition value);
  • CTE
WITH latest_booking_dates AS (
    SELECT column_name
    FROM table_name)
SELECT *
FROM table_name a
    JOIN other_table_name b 
    ON a.key_column = b.key_column;
Introduction to Data Modeling in Snowflake

Let's practice!

Introduction to Data Modeling in Snowflake

Preparing Video For Download...