Data Pipeline Automation in Snowflake
Emily Melhuish
Technical Curriculum Developer, Snowflake
Exporting data to cloud storage
COPY INTO writes results directly to a stageCOPY INTO @harbr_partner_export/
daily_summary/
FROM (SELECT * FROM
logistics.shipment_summary
WHERE export_date =
CURRENT_DATE() - 1);

COPY INTO @harbr_partner_export/shipment_summary/
FROM (
SELECT shipment_id,
origin,
destination,
delivery_status,
delivery_time_hours
FROM logistics.shipments
WHERE delivery_date = CURRENT_DATE()
)
FILE_FORMAT = (TYPE = 'CSV' HEADER = TRUE)
OVERWRITE = TRUE;
| Format | Best For |
|---|---|
| CSV | Universal - almost every system can read it; ideal for partner exports |
| JSON | Preserves nested structures; useful for semi-structured output consumers |
| Parquet | Large analytical datasets; columnar compression = smaller files, faster reads |
Key unloading options
-- Split large exports across multiple files (bytes)
HEADER = TRUE -- include column names in first row
OVERWRITE = TRUE -- replace any existing files at that path
MAX_FILE_SIZE = 104857600 -- 100 MB per output file

Kafka Connector
# connector.properties (Kafka settings)
snowflake.topic2table.map=events:
delivery_events
snowflake.ingestion.method=
SNOWPIPE_STREAMING
Spark Connector
// Read from Snowflake into a Spark DataFrame
val df = spark.read.format("snowflake")
.options(sfOptions).option("dbtable",
"shipments").load()
Universal connectivity
| Integration | Type | Use at Harbr |
|---|---|---|
| JDBC / ODBC | Universal drivers | BI tools (Tableau, Power BI, Looker) - query directly |
| Python Connector | Native Python driver | Data pipelines, scheduled ETL, data science workflows |
| dbt | SQL transformation | Runs models directly in Snowflake compute |
| Fivetran / Airbyte | Managed ingestion | SaaS source systems → Snowflake, no custom code |
Data Pipeline Automation in Snowflake