Subqueries

Data Manipulation in Snowflake

Jake Roach

Field Data Engineer

What are subqueries?

Subqueries are a tool that allow for the result of one query to be used by another query.

$$

  • Combine multiple queries
  • Focus on readability
  • Allow for modularity
  • Make data manipulation easier!

$$

FROM ( ... ) or WHERE ... IN ( ... )

Logical process of manipulating data in a more readable and modular manner with subqueries

Data Manipulation in Snowflake

Subqueries and FROM

SELECT
    ...
-- Pull from query, not from a table
FROM (

    -- Create a result set that will 
    -- be used by the main query
    SELECT 
        <fields>
    FROM <table>
    WHERE ...

);

Retrieve data from the result of another query rather than pulling directly from a table.

$$

  • Decouple manipulation from analysis
  • Makes queries easier to read, understand
  • Allows for "portability" and easier changes
  • JOIN, WHERE, etc.
Data Manipulation in Snowflake

Before a subquery

SELECT
    month_num,
    -- windchill - temperature has to be used twice here. What if this changes?
    AVG(windchill - temperature) AS avg_differential
    MIN(windchill - temperature) AS most_differential
FROM weather
WHERE 
    -- Filtering is taking place in the same query as aggregation/analysis
    season = 'Winter' AND
    temperature < 32
GROUP BY month_num;
Data Manipulation in Snowflake

After a subquery

-- Start with the subquery, then aggregate

SELECT month_num, AVG(differential) AS avg_differential MIN(differential) AS most_differential FROM (
SELECT month_num, windchill - temperature AS differential FROM weather WHERE season = 'Winter' AND temperature < 32
) GROUP BY month_num;
          | month_num | differential |
          | --------- | ------------ |
          |     12    |      -12     |
          |     1     |      -3      |
          |     1     |       0      |
          |     2     |      -7      |
| month_num | avg_differential | most_differential |
| --------- | ---------------- | ----------------- |
|     12    |       -5.77      |        -14        |
|     1     |       -1.91      |        -8         |
|     2     |       -8.13      |        -22        |

It's easy to understand and change the analysis once the data is cleaned.

Data Manipulation in Snowflake

WHERE ... IN ( ... )

...

-- Filter by records with a value in 
-- the subquery result set
WHERE <field> IN (

    SELECT <other-field> FROM ... 

);

Create a small result set to be used when transforming, filtering, or manipulating data.

$$

  • Filter for records IN a non-constant set of results
  • Can be used in other places in a query
  • AVG, MAX, MIN, etc.
Data Manipulation in Snowflake

WHERE ... IN( ... )

SELECT
    todays_date,
    temperature,
    status
FROM weather
WHERE todays_date IN (  -- Filter by all days with home games that were won

    SELECT
        game_date
    FROM game_schedule
    WHERE stadium = 'Home' AND did_win = TRUE

);
Data Manipulation in Snowflake

WHERE ... IN ( ... )

Result set for a query using a subquery to find weather data for each home game

Data Manipulation in Snowflake

Let's practice!

Data Manipulation in Snowflake

Preparing Video For Download...