Data Unloading and Connectivity

Data Pipeline Automation in Snowflake

Emily Melhuish

Technical Curriculum Developer, Snowflake

The Unloading Use Case

Exporting data to cloud storage

Not all partners have Snowflake access
COPY INTO writes results directly to a stage

No separate export pipeline needed

COPY INTO @harbr_partner_export/
daily_summary/
FROM (SELECT * FROM 
logistics.shipment_summary
WHERE export_date = 
CURRENT_DATE() - 1);

mermaid: snowflake unloading to partner storage

COPY INTO

COPY INTO @harbr_partner_export/shipment_summary/
FROM (
  SELECT shipment_id,
         origin,
         destination,
         delivery_status,
         delivery_time_hours
  FROM logistics.shipments
  WHERE delivery_date = CURRENT_DATE()
)
FILE_FORMAT = (TYPE = 'CSV' HEADER = TRUE)
OVERWRITE = TRUE;

Export File Format Options

Format	Best For
CSV	Universal - almost every system can read it; ideal for partner exports
JSON	Preserves nested structures; useful for semi-structured output consumers
Parquet	Large analytical datasets; columnar compression = smaller files, faster reads

Key unloading options

-- Split large exports across multiple files (bytes)
HEADER = TRUE              -- include column names in first row
OVERWRITE = TRUE           -- replace any existing files at that path
MAX_FILE_SIZE = 104857600  -- 100 MB per output file

The Connectivity Landscape

The connectivity landscape connectors

Kafka and Spark connectors

Kafka Connector

Topics stream directly into Snowflake tables
No files, no stages — uses Snowpipe Streaming
Latency measured in seconds

# connector.properties (Kafka settings)
snowflake.topic2table.map=events:
delivery_events
snowflake.ingestion.method=
SNOWPIPE_STREAMING

Spark Connector

Integrates with Spark's DataFrame API
Spark jobs read from and write to Snowflake
Handles large-scale transformation workloads

// Read from Snowflake into a Spark DataFrame
val df = spark.read.format("snowflake")
  .options(sfOptions).option("dbtable", 
    "shipments").load()

JDBC/ODBC and Partner Integrations

Universal connectivity

Integration	Type	Use at Harbr
JDBC / ODBC	Universal drivers	BI tools (Tableau, Power BI, Looker) - query directly
Python Connector	Native Python driver	Data pipelines, scheduled ETL, data science workflows
dbt	SQL transformation	Runs models directly in Snowflake compute
Fivetran / Airbyte	Managed ingestion	SaaS source systems → Snowflake, no custom code

Let's practice!

Data Pipeline Automation in Snowflake