Working with Unstructured Data

Snowflake Architecture

Emily Melhuish

Technical Curriculum Developer, Snowflake

Beyond Tables

Cloud including files such as pdfs, images and audio

Snowflake Architecture

Stages: The Landing Zone for Files

Representing internal + external stage

  • Internal: Managed by Snowflake
  • External: Files in the Cloud

  • unstructured data lives in a stage

Snowflake Architecture

Directory Tables

directory_table_top.png

Snowflake Architecture

Directory Tables Syntax

ENABLE DIRECTORY

CREATE STAGE snowy_peak_files_stage
  DIRECTORY = (ENABLE = TRUE);

Refresh directory metadata (ALTER STAGE)

ALTER STAGE snowy_peak_files_stage REFRESH;
Snowflake Architecture

Directory Tables

directory_table_final.png

Snowflake Architecture

Querying a Directory Table

SQL

SELECT 
file_name
, size
, last_modified
FROM DIRECTORY(@snowy_peak_files_stage);
  • Audit files in the stage
  • Identify stale files
  • Feed downstream processes
Snowflake Architecture

Pre-signed URLs

  • External partners and applications cannot access a stage
  • Snowflake can expose URL or credentials to grant partners access: Stage URLs, scoped file URLs, pre-signed URLs and related helpers

 

SQL

SELECT GET_PRESIGNED_URL(@snowy_peak_files_stage, 'avalanche_forecast.pdf', 3600);
  • You pass stage, file path and lifetime in seconds

  • Useful for partners who do not have Snowflake logins

Snowflake Architecture

Let's practice!

Snowflake Architecture

Preparing Video For Download...