Handling semi-structured data

Introduction to Snowflake SQL

George Boorman

Senior Curriculum Manager, DataCamp

Structured versus semi-structured

Example of structured data

| cust_id | cust_name | cust_age | cust_email            |
|---------|-----------|----------|-----------------------|
| 1       | cust1     | 40       | cust1***@gmail.com    |
| 2       | cust2     | 35       | cust2***@gmail.com    |
| 3       | cust3     | 42       | cust3***@gmail.com    |

Example of semi-structured data

Customer semi-structured data in the form of JSON with customer 2 having two emails

Introduction to Snowflake SQL

Introducing JSON

  • JavaScript Object Notation
  • Common use cases: Web APIs and Config files
  • JSON data structure:
    • Key-Value Pairs, e.g., cust_id: 1

Customer JSON records example having customer id, name, age and email

Introduction to Snowflake SQL

JSON in Snowflake

  • Native JSON support
  • Flexible for evolving schemas

 

Comparisons:

  • Postgres: Uses JSONB
  • Snowflake: Uses VARIANT
Introduction to Snowflake SQL

How Snowflake stores JSON data

  • VARIANT supports OBJECT and ARRAY data types
    • OBJECT: { "key": "value"}
    • ARRAY: ["list", "of", "values"]
  • Creating a Snowflake Table to handle JSON data
    CREATE TABLE cust_info_json_data (
      customer_id INT, 
      customer_info VARIANT -- VARIANT data type
    );
    
Introduction to Snowflake SQL

Semi-structured data functions

  • PARSE_JSON

    • expr: JSON data in string format
    • Returns: VARIANT type, valid JSON object
Introduction to Snowflake SQL

PARSE_JSON

Example:

SELECT PARSE_JSON(
  -- Enclosed in strings      
  '{
  "cust_id": 1,
  "cust_name": "cust1",
  "cust_age": 40,
  "cust_email":"cust1***@gmail.com"
  }
  '-- Enclosed in strings
) AS customer_info_json

Result of PARSE_JSON

Introduction to Snowflake SQL

OBJECT_CONSTRUCT

  • OBJECT_CONSTRUCT

    • Syntax: OBJECT_CONSTRUCT( [<key1>, <value1> [, <keyN>, <valueN> ...]] )
    • Returns: JSON object
    SELECT OBJECT_CONSTRUCT(
      -- Comma separated values rather than : notation
          'cust_id', 1,
          'cust_name', 'cust1',
          'cust_age', 40,
          'cust_email', 'cust1***@gmail.com'
    )

Table with `customer_info` column with data stored in JSON format

Introduction to Snowflake SQL

Querying JSON data in Snowflake

Simple JSON

  • :
    SELECT
      customer_info:cust_age, -- Use colon to access cust_age from column
      customer_info:cust_name,
      customer_info:cust_email,
    FROM 
      cust_info_json_data;
    

Query result showing customer age, customer name and email as separate columns

Introduction to Snowflake SQL

Querying nested JSON Data in Snowflake

Example of nested JSON

A nested JSON data where address is an object having street, city and state key-value pairs

  • Colon: :
  • Dot: .
Introduction to Snowflake SQL

Querying nested JSON using colon/dot notations

Accessing values using colon notation

<column>:<level1_element>:<level2_element>:<level3_element>

SELECT 
    customer_info:address:street AS street_name 
FROM 
    cust_info_json_data

Query result showing street_name

Accessing values using dot notation

<column>:<level1_element>.<level2_element>.<level3_element>

SELECT
    customer_info:address.street AS street_name
FROM
    cust_info_json_data

Query result showing street_name

Introduction to Snowflake SQL

Let's practice!

Introduction to Snowflake SQL

Preparing Video For Download...