Data Manipulation in Snowflake
Jake Roach
Field Data Engineer
Subqueries are a tool that allow for the result of one query to be used by another query.
$$
$$
FROM ( ... )
or WHERE ... IN ( ... )
SELECT
...
-- Pull from query, not from a table
FROM (
-- Create a result set that will
-- be used by the main query
SELECT
<fields>
FROM <table>
WHERE ...
);
Retrieve data from the result of another query rather than pulling directly from a table.
$$
JOIN
, WHERE
, etc.SELECT
month_num,
-- windchill - temperature has to be used twice here. What if this changes?
AVG(windchill - temperature) AS avg_differential
MIN(windchill - temperature) AS most_differential
FROM weather
WHERE
-- Filtering is taking place in the same query as aggregation/analysis
season = 'Winter' AND
temperature < 32
GROUP BY month_num;
-- Start with the subquery, then aggregate
SELECT month_num, AVG(differential) AS avg_differential MIN(differential) AS most_differential FROM (
SELECT month_num, windchill - temperature AS differential FROM weather WHERE season = 'Winter' AND temperature < 32
) GROUP BY month_num;
| month_num | differential |
| --------- | ------------ |
| 12 | -12 |
| 1 | -3 |
| 1 | 0 |
| 2 | -7 |
| month_num | avg_differential | most_differential |
| --------- | ---------------- | ----------------- |
| 12 | -5.77 | -14 |
| 1 | -1.91 | -8 |
| 2 | -8.13 | -22 |
It's easy to understand and change the analysis once the data is cleaned.
...
-- Filter by records with a value in
-- the subquery result set
WHERE <field> IN (
SELECT <other-field> FROM ...
);
Create a small result set to be used when transforming, filtering, or manipulating data.
$$
IN
a non-constant set of resultsAVG
, MAX
, MIN
, etc.SELECT
todays_date,
temperature,
status
FROM weather
WHERE todays_date IN ( -- Filter by all days with home games that were won
SELECT
game_date
FROM game_schedule
WHERE stadium = 'Home' AND did_win = TRUE
);
Data Manipulation in Snowflake