How Caching Works in Snowflake

Data Pipeline Automation in Snowflake

Emily Melhuish

Technical Curriculum Developer, Snowflake

Three Caching Layers

Caching layers.png

Snowflake caches results at three levels — each with different scope and duration

Result cache — stores complete query results and returns them without runnig a warehouse
Warehouse or data cache — recently scanned micro-partitions in memory and SSD
Metadata cache — always on; table stats used during query planning

When the Result Cache is used (and when it isn't)

Snowflake Query Result Cache flow diagram showing a query returning an instant cached result to the user

Condition	Cache Miss When…
Query text matches exactly	Casing, whitespace, or aliases differ
No non-reusable functions	functions that change between runs: RANDOM
Underlying tables unchanged	DML, reclustering, or consolidation since last run
Persisted result still available	24-hour TTL elapsed without reuse, or 31 days
Role has the required privileges	SELECT: role lacks privileges SHOW: different role

Warehouse cache and Metadata cache

Snowflake data cache diagram showing warehouse cache hit versus cache miss scenarios

Warehouse cache

Stores recently scanned micro-partitions on warehouse SSD + memory
Cleared when the warehouse suspends
Keep auto-suspend long enough to preserve the cache

Metadata cache

Always on, lives in Cloud Services
Stores table statistics
Some queries answered from metadata alone

SQL patterns that Prevent Partition Pruning

Prevents pruning:

SELECT *
FROM logistics.shipments
WHERE MONTH(dispatch_date) = 3;

Allows pruning:

SELECT *
FROM logistics.shipments
WHERE dispatch_date >= '2024-03-01'
  AND dispatch_date <  '2024-04-01';

Two more SQL patterns to avoid

Avoid SELECT *

-- Reads every column
SELECT * 
FROM logistics.shipments 
WHERE region = 'EMEA';

-- Reads only what's needed
SELECT shipment_id
, delivery_days 
FROM logistics.shipments 
WHERE region = 'EMEA';

Avoid implicit cross-joins

-- Missing join condition = Cartesian
SELECT s.shipment_id
, c.name 
FROM shipments s, carriers c;

-- Explicit JOIN with ON condition
SELECT s.shipment_id
, c.name 
FROM shipments s 
JOIN carriers c ON s.carrier_id = c.id;

Let's practice!

Data Pipeline Automation in Snowflake