Semi-gestructureerde data flattenen

Gegevenstypen en functies in Snowflake

Jake Roach

Field Data Engineer

Gestructureerde data

     school_id  |    school_name    |  street_number  |  street_name  |  suffix   |      city      |  zip_code
    ----------- | ----------------- | --------------- | ------------- | --------- | -------------- | ----------
      s_19219   |  West Aurora HS   |       879       |    Main       |    St.    |  West Aurora   |   25041
      s_77465   |  Springtown HS    |      1645       |    Cherry     |    Rd.    |  Springtown    |   14556
     school_id  |                address_info        
    ----------- | ------------------------------------------
      s_19219   |    {
                |        "school_name": "West Aurora HS",
                |        "address": {
                |            "street_number": 879,
                |            "street_name": "Main",
                |            "suffix":  "St."
                |            "city": "West Aurora",
                |            "zip_code": 25041
                |        }
                |    }
Gegevenstypen en functies in Snowflake

Semi-gestructureerde data

Data met accolades in key-valueparen heeft het datatype VARIANT

{
    "school_name": "West Aurora HS",
    "address": {  -- Genest object
        "street_number": 879,
        "street_name": "Main",
        "suffix": "St.",
        "city": "West Aurora",
        "zip_code": 25041
    }
}

$$

$$

$$

  • Net als een Python-dictionary of JSON-object
  • Laat data in “ruwe” vorm opslaan
  • Nest objecten, zoals address
  • Haal data op via twee manieren
Gegevenstypen en functies in Snowflake

Dot-notatie

                  my_column  
 -------------------------------------------
        {
            "my_first_key": 2025,
            "my_second_key": {
                "a": "alpha",
                "b": "bravo"
            }
        }
SELECT

my_column:my_first_key -- Top-level
my_column:my_second_key.a -- Genest my_column:my_second_key.b -- Genest
...

Maakt het makkelijk om top-level en geneste waarden uit VARIANT te halen

$$

  • Dubbelepunt scheidt <column-name>:<top-level-key>
  • Voeg een . toe gevolgd door het geneste veld, <column-name>:<top-level-key>.<nested-key>
  • Haal diep geneste waarden op
Gegevenstypen en functies in Snowflake

Dot-notatie

SELECT
    address_info:school_name,                              -- Top-level, dot-notatie

    address_info:address.street_number AS street_number,   -- Genest, dot-notatie
    address_info:address.street_name AS street_name,
    address_info:address.suffix AS suffix

FROM SCHOOLS.school_info;
              school_name    |  street_number  |  street_name  |  suffix  
           ----------------- | --------------- | ------------- | ---------
            West Aurora HS   |       879       |    Main       |    St.   
            Springtown HS    |      1645       |    Cherry     |    Rd.
Gegevenstypen en functies in Snowflake

Bracket-notatie

Biedt een extra manier om top-level en geneste waarden op te halen

$$

  • <column-name>['<top-level-key']['...']
  • Veel geneste lagen
  • Lijkt op ophalen uit een Python-dict
  • Gebruik steeds enkele aanhalingstekens (')!
                    my_column  
 ------------------------------------------------
          {
              "my_first_key": 2025,
               my_second_key": {
                  "a": "alpha",
                  "b": "bravo"
              }
          }
SELECT

my_column['my_first_key'], -- Top-level
my_column['my_second_key']['a'] -- Genest my_column['my_second_key']['b'] -- Genest
...
Gegevenstypen en functies in Snowflake

Bracket-notatie

SELECT

address_info['school_name'], -- Top-level, bracket-notatie
address_info['address']['city'] AS city, -- Genest, bracket-notatie address_info['address']['zip_code'] AS zip_code
FROM SCHOOLS.school_info;
                     school_name    |      city      |  zip_code
                  ----------------- | -------------- | ----------
                   West Aurora HS   |  West Aurora   |   25041
                   Springtown HS    |  Springtown    |   14556
Gegevenstypen en functies in Snowflake

Semi-gestructureerde data transformeren

SELECT
    school_id,

    address_info:school_name AS school_name,               -- Top-level, dot-notatie

    address_info:address.street_number AS street_number,   -- Genest, dot-notatie
    address_info:address.street_name AS street_name,
    address_info:address.suffix AS suffix,

    address_info['address']['city'] AS city,               -- Genest, bracket-notatie
    address_info['address']['zip_code'] AS zip_code

FROM SCHOOLS.school_info;
Gegevenstypen en functies in Snowflake

Semi-gestructureerde data transformeren

     school_id  |                address_info        
    ----------- | ------------------------------------------
      s_19219   |    {
                |        "school_name": "West Aurora HS",
                |        "address": {
                |            "street_number": 879,
                |            "street_name": "Main",
                |            "suffix":  "St."
                |            "city": "West Aurora",
                |            "zip_code": 25041
                |        }
                |    }
     school_id  |    school_name    |  street_number  |  street_name  |  suffix   |      city      |  zip_code
    ----------- | ----------------- | --------------- | ------------- | --------- | -------------- | ----------
      s_19219   |  West Aurora HS   |       879       |    Main       |    St.    |  West Aurora   |   25041
      s_77465   |  Springtown HS    |      1645       |    Cherry     |    Rd.    |  Springtown    |   14556
Gegevenstypen en functies in Snowflake

Laten we oefenen!

Gegevenstypen en functies in Snowflake

Preparing Video For Download...