Data Unloading and Connectivity

Data Pipeline Automation in Snowflake

Emily Melhuish

Technical Curriculum Developer, Snowflake

The Unloading Use Case

Exporting data to cloud storage

  • Not all partners have Snowflake access
  • COPY INTO writes results directly to a stage
  • No separate export pipeline needed
    COPY INTO @harbr_partner_export/
    daily_summary/
    FROM (SELECT * FROM 
    logistics.shipment_summary
    WHERE export_date = 
    CURRENT_DATE() - 1);
    

 

 

mermaid: snowflake unloading to partner storage

Data Pipeline Automation in Snowflake

COPY INTO

COPY INTO @harbr_partner_export/shipment_summary/
FROM (
  SELECT shipment_id,
         origin,
         destination,
         delivery_status,
         delivery_time_hours
  FROM logistics.shipments
  WHERE delivery_date = CURRENT_DATE()
)
FILE_FORMAT = (TYPE = 'CSV' HEADER = TRUE)
OVERWRITE = TRUE;
Data Pipeline Automation in Snowflake

Export File Format Options

 

Format Best For
CSV Universal - almost every system can read it; ideal for partner exports
JSON Preserves nested structures; useful for semi-structured output consumers
Parquet Large analytical datasets; columnar compression = smaller files, faster reads

Key unloading options

-- Split large exports across multiple files (bytes)
HEADER = TRUE              -- include column names in first row
OVERWRITE = TRUE           -- replace any existing files at that path
MAX_FILE_SIZE = 104857600  -- 100 MB per output file
Data Pipeline Automation in Snowflake

The Connectivity Landscape

The connectivity landscape connectors

Data Pipeline Automation in Snowflake

Kafka and Spark connectors

Kafka Connector

  • Topics stream directly into Snowflake tables
  • No files, no stages — uses Snowpipe Streaming
  • Latency measured in seconds
# connector.properties (Kafka settings)
snowflake.topic2table.map=events:
delivery_events
snowflake.ingestion.method=
SNOWPIPE_STREAMING

Spark Connector

  • Integrates with Spark's DataFrame API
  • Spark jobs read from and write to Snowflake
  • Handles large-scale transformation workloads
// Read from Snowflake into a Spark DataFrame
val df = spark.read.format("snowflake")
  .options(sfOptions).option("dbtable", 
    "shipments").load()
Data Pipeline Automation in Snowflake

JDBC/ODBC and Partner Integrations

Universal connectivity

Integration Type Use at Harbr
JDBC / ODBC Universal drivers BI tools (Tableau, Power BI, Looker) - query directly
Python Connector Native Python driver Data pipelines, scheduled ETL, data science workflows
dbt SQL transformation Runs models directly in Snowflake compute
Fivetran / Airbyte Managed ingestion SaaS source systems → Snowflake, no custom code
Data Pipeline Automation in Snowflake

Let's practice!

Data Pipeline Automation in Snowflake

Preparing Video For Download...