External and Iceberg Tables

Data Pipeline Automation in Snowflake

Emily Melhuish

Technical Curriculum Developer, Snowflake

What is an External Table?

Use case:

Harbr receives files from dozens of partners daily
Loading them into Snowflake doesn't make sense

external table.png

External Tables

Data stays in S3, GCS, or Azure Blob — never moves into Snowflake
Snowflake stores metadata: file paths, schemas
Reads files at query time

Creating an External Table

CREATE EXTERNAL TABLE logistics.supplier_inventory
  WITH LOCATION = @harbr_stage/inventory/
  FILE_FORMAT = (TYPE = 'PARQUET')
  AUTO_REFRESH = TRUE;

When to Use External Tables

Data is owned by a partner — query without taking on data management
Files are large, queried infrequently — avoid loading data accessed once a month
Data cannot move for compliance — files must stay in their original location
Exploring before committing — read raw files before deciding what to load

-- Query partner inventory 
SELECT supplier_id,
       COUNT(*)       AS file_count,
       SUM(quantity)  AS total_units
FROM logistics.supplier_inventory
WHERE delivery_date >= '2024-01-01'
GROUP BY supplier_id;

External Tables vs Loading Data

External tables

Zero Snowflake storage cost
Always reflects current cloud storage state
Best for infrequently queried, archived data

-- Query directly, no load needed
SELECT * FROM 
logistics.supplier_inventory
WHERE supplier_id = 'SUP-042';

Loading with COPY INTO

Full Snowflake query performance
Best for frequently queried, live data

-- Load once, query fast
COPY INTO logistics.delivery_events
FROM @harbr_stage/events/;

Apache Iceberg Tables

How do we ensure we read data correctly from S3, Azure Blob or GCS?
- Apache
Apache = Open table format: data not locked to any engine
- Parquet files + metadata layer (time travel, schema history, partitions)
- database-grade reliability

Screenshot 2026-05-11 at 12.14.31 pm.png

¹ * Snowflake learning material

Snowflake-managed vs Externally-managed

Snowflake-managed

Snowflake owns and writes the Iceberg metadata
Full read-write access from SQL

CREATE ICEBERG TABLE 
logistics.shipments
  CATALOG = 'SNOWFLAKE'
  EXTERNAL_VOLUME = 'harbr_s3_vol'
  BASE_LOCATION = 'iceberg/shipments/';

Externally-managed

Outside catalog owns metadata - AWS Glue, Apache Polaris

CREATE ICEBERG TABLE 
logistics.partner_data
  CATALOG = 'glue_catalog'
  EXTERNAL_VOLUME = 'harbr_s3_vol'
  CATALOG_TABLE_NAME = 
  'partner.deliveries';

Let's Practice!

Data Pipeline Automation in Snowflake