Stages, File Formats, and COPY INTO

Data Pipeline Automation in Snowflake

Emily Melhuish

Technical Curriculum Developer, Snowflake

Snowflake powers many workloads

Screenshot 2026-05-11 at 11.52.34 am.png

¹ * Snowflake Learning Material

Screenshot 2026-05-11 at 10.48.51 am.png

¹ * Snowflake Learning Material

Fake logo for Harbr

About Harbr

Global logistics and supply chain platform
Uses Snowflake to track shipments, warehouse inventory, and delivery performance
Ingests data from supplier systems across multiple regions

Data loading diagram

Stage = temporary storage location

Two stage types

Internal Stages

External Stage

Directory Table

CREATE STAGE harbr_stage
  DIRECTORY = (ENABLE = TRUE)

Internal Stage Encryption

ENCRYPTION = (TYPE = SNOWFLAKE_FULL)
ENCRYPTION = (TYPE = SNOWFLAKE_SSE)

External Stage Encryption

ENCRYPTION = 
([ TYPE = 'AWS_CSE' ] MASTER_KEY = '') 

ENCRYPTION = 
([ TYPE = 'AWS_SSE_S3' ]) 

...

Structured formats: CSV with specific delimiters, header rows, quoting rules
Semi-structured formats: JSON, Parquet, Avro, ORC and XML.
- Load directly into a VARIANT column
- Define parsing rules once, reuse across all load operations
Update the format object once

SQL

CREATE FILE FORMAT harbr_csv_format
  TYPE = 'CSV'
  FIELD_DELIMITER = ','
  SKIP_HEADER = 1;

Load from a named stage using a named file format

COPY INTO logistics.shipments
FROM @harbr_internal_stage/shipments/
FILE_FORMAT = (FORMAT_NAME = 'harbr_csv_format')

ON_ERROR Options	Behavior
`ABORT_STATEMENT`	Fail entire load on first error (default)
`CONTINUE`	Skip bad rows, load everything else
`SKIP_FILE`	Skip entire file if it contains any errors
`SKIP_FILE_<num>`	Skip file only if errors exceed a specified count
`SKIP_FILE_<num>%`	Skip file only if errors exceed a specified % threshold

Validate without loading = dry run

COPY INTO logistics.shipments
FROM @harbr_internal_stage/shipments/
FILE_FORMAT = (FORMAT_NAME = 'harbr_csv_format')
VALIDATION_MODE = RETURN_ERRORS;

Inspect load history

SELECT * FROM TABLE(INFORMATION_SCHEMA.COPY_HISTORY(
  TABLE_NAME => 'shipments',
  START_TIME => DATEADD('hour', -24,
    CURRENT_TIMESTAMP())));

Data Pipeline Automation in Snowflake