Optimization Methods

Data Pipeline Automation in Snowflake

Emily Melhuish

Technical Curriculum Developer, Snowflake

More Compute isn't Always the Fix

bigger_house

Data Pipeline Automation in Snowflake

Four Different Approaches

nanobanana: half: Four distinct geometric tool icons arranged in a two by two grid, abstract flat design, DataCamp navy blue and green colors, white background, no text, minimalist clean

  • Search Optimization - equality lookups on high-cardinality IDs
  • Query Acceleration Service - unpredictable heavy scans
  • Automatic Clustering - repeated range filters on the same columns
  • Materialized Views - expensive aggregations run often
Data Pipeline Automation in Snowflake

Search Optimization

  • Helps finding queries that look for a small number of rows in a large table
  • Builds a persistent access path; some micro-partitions can be skipped
  • Storage + compute cost for maintaining the access path
ALTER TABLE logistics.shipments
  ADD SEARCH OPTIMIZATION;
1 docs.snowflake.com/en/user-guide/search-optimization-service
Data Pipeline Automation in Snowflake

Query Acceleration Service

  • Handles large unpredictable scans
  • Supports SELECT, INSERT, CREATE TABLE AS SELECT, COPY INTO
  • QUERY_ACCELERATION_MAX_SCALE_FACTOR: default 8; 0 means unlimited, not disabled
ALTER WAREHOUSE harbr_wh SET
  ENABLE_QUERY_ACCELERATION = TRUE
  QUERY_ACCELERATION_MAX_SCALE_FACTOR 
  = 8;
1 docs.snowflake.com/en/user-guide/query-acceleration-service
Data Pipeline Automation in Snowflake

Automatic Clustering

  • For tables where the same filter columns appear in most queries
  • Snowflake reorders micro-partitions so range filters can skip more data
  • Background maintenance - no warehouse needed
  • Trade-off: ongoing compute cost as new data arrives
ALTER TABLE logistics.delivery_events
  CLUSTER BY (region, dispatch_date);
1 docs.snowflake.com/en/user-guide/tables-auto-reclustering
Data Pipeline Automation in Snowflake

Materialized Views

  • Needed when an expensive query runs over and over again
  • Pre-computes and stores a query's result; reads hit the stored result
  • Snowflake maintains it automatically in the background
  • SQL restrictions apply: no HAVING, limited joins, no window functions
CREATE MATERIALIZED VIEW 
logistics.emea_summary_mv AS
  SELECT region, carrier_id,
         COUNT(*) AS shipments,
         AVG(delivery_days) AS avg_days
  FROM logistics.shipments
  WHERE region = 'EMEA'
  GROUP BY region, carrier_id;
1 docs.snowflake.com/en/user-guide/views-materialized
Data Pipeline Automation in Snowflake

Choosing the Right Method

Signal in the Query Profile Method to Consider
TableScan reads almost all partitions; equality lookup on a high-cardinality column Search Optimization
Heavy aggregation, large scans, arrives unpredictably Query Acceleration Service
Same filter columns appear in most queries; pruning is weak Automatic Clustering
Same expensive aggregation runs frequently on slow-changing data Materialized View
  • Methods complement each other
Data Pipeline Automation in Snowflake

Let's practice!

Data Pipeline Automation in Snowflake

Preparing Video For Download...