Optimization Methods

Data Pipeline Automation in Snowflake

Emily Melhuish

Technical Curriculum Developer, Snowflake

More Compute isn't Always the Fix

bigger_house

Four Different Approaches

nanobanana: half: Four distinct geometric tool icons arranged in a two by two grid, abstract flat design, DataCamp navy blue and green colors, white background, no text, minimalist clean

Search Optimization - equality lookups on high-cardinality IDs
Query Acceleration Service - unpredictable heavy scans
Automatic Clustering - repeated range filters on the same columns
Materialized Views - expensive aggregations run often

Search Optimization

Helps finding queries that look for a small number of rows in a large table
Builds a persistent access path; some micro-partitions can be skipped
Storage + compute cost for maintaining the access path

ALTER TABLE logistics.shipments
  ADD SEARCH OPTIMIZATION;

¹ docs.snowflake.com/en/user-guide/search-optimization-service

Query Acceleration Service

Handles large unpredictable scans
Supports SELECT, INSERT, CREATE TABLE AS SELECT, COPY INTO
QUERY_ACCELERATION_MAX_SCALE_FACTOR: default 8; 0 means unlimited, not disabled

ALTER WAREHOUSE harbr_wh SET
  ENABLE_QUERY_ACCELERATION = TRUE
  QUERY_ACCELERATION_MAX_SCALE_FACTOR 
  = 8;

¹ docs.snowflake.com/en/user-guide/query-acceleration-service

Automatic Clustering

For tables where the same filter columns appear in most queries
Snowflake reorders micro-partitions so range filters can skip more data
Background maintenance - no warehouse needed
Trade-off: ongoing compute cost as new data arrives

ALTER TABLE logistics.delivery_events
  CLUSTER BY (region, dispatch_date);

¹ docs.snowflake.com/en/user-guide/tables-auto-reclustering

Materialized Views

Needed when an expensive query runs over and over again
Pre-computes and stores a query's result; reads hit the stored result
Snowflake maintains it automatically in the background
SQL restrictions apply: no HAVING, limited joins, no window functions

CREATE MATERIALIZED VIEW 
logistics.emea_summary_mv AS
  SELECT region, carrier_id,
         COUNT(*) AS shipments,
         AVG(delivery_days) AS avg_days
  FROM logistics.shipments
  WHERE region = 'EMEA'
  GROUP BY region, carrier_id;

¹ docs.snowflake.com/en/user-guide/views-materialized

Choosing the Right Method

Signal in the Query Profile	Method to Consider
TableScan reads almost all partitions; equality lookup on a high-cardinality column	Search Optimization
Heavy aggregation, large scans, arrives unpredictably	Query Acceleration Service
Same filter columns appear in most queries; pruning is weak	Automatic Clustering
Same expensive aggregation runs frequently on slow-changing data	Materialized View

Methods complement each other

Let's practice!

Data Pipeline Automation in Snowflake