Flattening Semi-Structured Data

Data Types and Functions in Snowflake

Jake Roach

Field Data Engineer

Structured data

     school_id  |    school_name    |  street_number  |  street_name  |  suffix   |      city      |  zip_code
    ----------- | ----------------- | --------------- | ------------- | --------- | -------------- | ----------
      s_19219   |  West Aurora HS   |       879       |    Main       |    St.    |  West Aurora   |   25041
      s_77465   |  Springtown HS    |      1645       |    Cherry     |    Rd.    |  Springtown    |   14556
     school_id  |                address_info        
    ----------- | ------------------------------------------
      s_19219   |    {
                |        "school_name": "West Aurora HS",
                |        "address": {
                |            "street_number": 879,
                |            "street_name": "Main",
                |            "suffix":  "St."
                |            "city": "West Aurora",
                |            "zip_code": 25041
                |        }
                |    }
Data Types and Functions in Snowflake

Semi-structured data

Data stored with braces in key-values pairs takes the data type VARIANT

{
    "school_name": "West Aurora HS",
    "address": {  -- Nested object
        "street_number": 879,
        "street_name": "Main",
        "suffix": "St.",
        "city": "West Aurora",
        "zip_code": 25041
    }
}

$$

$$

$$

  • Like a Python dictionary or JSON object
  • Allows data to be stored in "raw" format
  • Nest objects, like address
  • Retrieve data in two different ways
Data Types and Functions in Snowflake

Dot-notation

                  my_column  
 -------------------------------------------
        {
            "my_first_key": 2025,
            "my_second_key": {
                "a": "alpha",
                "b": "bravo"
            }
        }
SELECT

my_column:my_first_key -- Top-level
my_column:my_second_key.a -- Nested my_column:my_second_key.b -- Nested
...

Makes it easy to retrieve top-level and nested values from VARIANT data

$$

  • Colon separates <column-name>:<top-level-key>
  • Add a . followed by the nested field, <column-name>:<top-level-key>.<nested-key>
  • Retrieve deeply-nested values
Data Types and Functions in Snowflake

Dot-notation

SELECT
    address_info:school_name,                              -- Top-level, dot-notation

    address_info:address.street_number AS street_number,   -- Nested, dot-notation
    address_info:address.street_name AS street_name,
    address_info:address.suffix AS suffix

FROM SCHOOLS.school_info;
              school_name    |  street_number  |  street_name  |  suffix  
           ----------------- | --------------- | ------------- | ---------
            West Aurora HS   |       879       |    Main       |    St.   
            Springtown HS    |      1645       |    Cherry     |    Rd.
Data Types and Functions in Snowflake

Bracket-notation

Provides an additional technique for retrieving top-level and nested values

$$

  • <column-name>['<top-level-key']['...']
  • Many nested layers
  • Like retrieving data from a Python dictionary
  • Make sure to use single quotes (')!
                    my_column  
 ------------------------------------------------
          {
              "my_first_key": 2025,
               my_second_key": {
                  "a": "alpha",
                  "b": "bravo"
              }
          }
SELECT

my_column['my_first_key'], -- Top-level
my_column['my_second_key']['a'] -- Nested my_column['my_second_key']['b'] -- Nested
...
Data Types and Functions in Snowflake

Bracket-notation

SELECT

address_info['school_name'], -- Top-level, bracket-notation
address_info['address']['city'] AS city, -- Nested, bracket-notation address_info['address']['zip_code'] AS zip_code
FROM SCHOOLS.school_info;
                     school_name    |      city      |  zip_code
                  ----------------- | -------------- | ----------
                   West Aurora HS   |  West Aurora   |   25041
                   Springtown HS    |  Springtown    |   14556
Data Types and Functions in Snowflake

Transforming semi-structured data

SELECT
    school_id,

    address_info:school_name AS school_name,               -- Top-level, dot-notation

    address_info:address.street_number AS street_number,   -- Nested, dot-notation
    address_info:address.street_name AS street_name,
    address_info:address.suffix AS suffix,

    address_info['address']['city'] AS city,               -- Nested, bracket-notation
    address_info['address']['zip_code'] AS zip_code

FROM SCHOOLS.school_info;
Data Types and Functions in Snowflake

Transforming semi-structured data

     school_id  |                address_info        
    ----------- | ------------------------------------------
      s_19219   |    {
                |        "school_name": "West Aurora HS",
                |        "address": {
                |            "street_number": 879,
                |            "street_name": "Main",
                |            "suffix":  "St."
                |            "city": "West Aurora",
                |            "zip_code": 25041
                |        }
                |    }
     school_id  |    school_name    |  street_number  |  street_name  |  suffix   |      city      |  zip_code
    ----------- | ----------------- | --------------- | ------------- | --------- | -------------- | ----------
      s_19219   |  West Aurora HS   |       879       |    Main       |    St.    |  West Aurora   |   25041
      s_77465   |  Springtown HS    |      1645       |    Cherry     |    Rd.    |  Springtown    |   14556
Data Types and Functions in Snowflake

Let's practice!

Data Types and Functions in Snowflake

Preparing Video For Download...