How Caching Works in Snowflake

Data Pipeline Automation in Snowflake

Emily Melhuish

Technical Curriculum Developer, Snowflake

Three Caching Layers

Caching layers.png

Snowflake caches results at three levels — each with different scope and duration

  • Result cache — stores complete query results and returns them without runnig a warehouse
  • Warehouse or data cache — recently scanned micro-partitions in memory and SSD
  • Metadata cache — always on; table stats used during query planning
Data Pipeline Automation in Snowflake

When the Result Cache is used (and when it isn't)

Snowflake Query Result Cache flow diagram showing a query returning an instant cached result to the user

Condition Cache Miss When…
Query text matches exactly Casing, whitespace, or aliases differ
No non-reusable functions functions that change between runs: RANDOM
Underlying tables unchanged DML, reclustering, or consolidation since last run
Persisted result still available 24-hour TTL elapsed without reuse, or 31 days
Role has the required privileges SELECT: role lacks privileges SHOW: different role
Data Pipeline Automation in Snowflake

Warehouse cache and Metadata cache

Snowflake data cache diagram showing warehouse cache hit versus cache miss scenarios

Warehouse cache

  • Stores recently scanned micro-partitions on warehouse SSD + memory
  • Cleared when the warehouse suspends
  • Keep auto-suspend long enough to preserve the cache

Metadata cache

  • Always on, lives in Cloud Services
  • Stores table statistics
  • Some queries answered from metadata alone
Data Pipeline Automation in Snowflake

SQL patterns that Prevent Partition Pruning

Prevents pruning:

SELECT *
FROM logistics.shipments
WHERE MONTH(dispatch_date) = 3;

Allows pruning:

SELECT *
FROM logistics.shipments
WHERE dispatch_date >= '2024-03-01'
  AND dispatch_date <  '2024-04-01';
Data Pipeline Automation in Snowflake

Two more SQL patterns to avoid

Avoid SELECT *

-- Reads every column
SELECT * 
FROM logistics.shipments 
WHERE region = 'EMEA';

-- Reads only what's needed
SELECT shipment_id
, delivery_days 
FROM logistics.shipments 
WHERE region = 'EMEA';

Avoid implicit cross-joins

-- Missing join condition = Cartesian
SELECT s.shipment_id
, c.name 
FROM shipments s, carriers c;

-- Explicit JOIN with ON condition
SELECT s.shipment_id
, c.name 
FROM shipments s 
JOIN carriers c ON s.carrier_id = c.id;
Data Pipeline Automation in Snowflake

Let's practice!

Data Pipeline Automation in Snowflake

Preparing Video For Download...