Data Pipeline Automation in Snowflake
Emily Melhuish
Technical Curriculum Developer, Snowflake

Snowflake caches results at three levels — each with different scope and duration

| Condition | Cache Miss When… |
|---|---|
| Query text matches exactly | Casing, whitespace, or aliases differ |
| No non-reusable functions | functions that change between runs: RANDOM |
| Underlying tables unchanged | DML, reclustering, or consolidation since last run |
| Persisted result still available | 24-hour TTL elapsed without reuse, or 31 days |
| Role has the required privileges | SELECT: role lacks privileges SHOW: different role |

Warehouse cache
Metadata cache
Prevents pruning:
SELECT *
FROM logistics.shipments
WHERE MONTH(dispatch_date) = 3;
Allows pruning:
SELECT *
FROM logistics.shipments
WHERE dispatch_date >= '2024-03-01'
AND dispatch_date < '2024-04-01';
Avoid SELECT *
-- Reads every column
SELECT *
FROM logistics.shipments
WHERE region = 'EMEA';
-- Reads only what's needed
SELECT shipment_id
, delivery_days
FROM logistics.shipments
WHERE region = 'EMEA';
Avoid implicit cross-joins
-- Missing join condition = Cartesian
SELECT s.shipment_id
, c.name
FROM shipments s, carriers c;
-- Explicit JOIN with ON condition
SELECT s.shipment_id
, c.name
FROM shipments s
JOIN carriers c ON s.carrier_id = c.id;
Data Pipeline Automation in Snowflake