External and Iceberg Tables

Data Pipeline Automation in Snowflake

Emily Melhuish

Technical Curriculum Developer, Snowflake

What is an External Table?

Use case:

  • Harbr receives files from dozens of partners daily
  • Loading them into Snowflake doesn't make sense

external table.png

External Tables

  • Data stays in S3, GCS, or Azure Blob — never moves into Snowflake
  • Snowflake stores metadata: file paths, schemas
  • Reads files at query time
Data Pipeline Automation in Snowflake

Creating an External Table

CREATE EXTERNAL TABLE logistics.supplier_inventory
  WITH LOCATION = @harbr_stage/inventory/
  FILE_FORMAT = (TYPE = 'PARQUET')
  AUTO_REFRESH = TRUE;
Data Pipeline Automation in Snowflake

When to Use External Tables

  • Data is owned by a partner — query without taking on data management
  • Files are large, queried infrequently — avoid loading data accessed once a month
  • Data cannot move for compliance — files must stay in their original location
  • Exploring before committing — read raw files before deciding what to load
-- Query partner inventory 
SELECT supplier_id,
       COUNT(*)       AS file_count,
       SUM(quantity)  AS total_units
FROM logistics.supplier_inventory
WHERE delivery_date >= '2024-01-01'
GROUP BY supplier_id;
Data Pipeline Automation in Snowflake

External Tables vs Loading Data

External tables

  • Zero Snowflake storage cost
  • Always reflects current cloud storage state
  • Best for infrequently queried, archived data
-- Query directly, no load needed
SELECT * FROM 
logistics.supplier_inventory
WHERE supplier_id = 'SUP-042';

Loading with COPY INTO

  • Full Snowflake query performance
  • Best for frequently queried, live data
-- Load once, query fast
COPY INTO logistics.delivery_events
FROM @harbr_stage/events/;
Data Pipeline Automation in Snowflake

Apache Iceberg Tables

  • How do we ensure we read data correctly from S3, Azure Blob or GCS?

    • Apache
  • Apache = Open table format: data not locked to any engine

    • Parquet files + metadata layer (time travel, schema history, partitions)
    • database-grade reliability

Screenshot 2026-05-11 at 12.14.31 pm.png

1 * Snowflake learning material
Data Pipeline Automation in Snowflake

Snowflake-managed vs Externally-managed

Snowflake-managed

  • Snowflake owns and writes the Iceberg metadata
  • Full read-write access from SQL
CREATE ICEBERG TABLE 
logistics.shipments
  CATALOG = 'SNOWFLAKE'
  EXTERNAL_VOLUME = 'harbr_s3_vol'
  BASE_LOCATION = 'iceberg/shipments/';

Externally-managed

  • Outside catalog owns metadata - AWS Glue, Apache Polaris
CREATE ICEBERG TABLE 
logistics.partner_data
  CATALOG = 'glue_catalog'
  EXTERNAL_VOLUME = 'harbr_s3_vol'
  CATALOG_TABLE_NAME = 
  'partner.deliveries';
Data Pipeline Automation in Snowflake

Let's Practice!

Data Pipeline Automation in Snowflake

Preparing Video For Download...