Semistructured data

Introduction to Redshift

Jason Myers

Principal Architect

Semistructured data

  • Stored in Redshift's SUPER type
  • Has dedicated SQL functions for different formats
  • JSON is a type of semistructured data
Introduction to Redshift

JSON Validation Functions

  • IS_VALID_JSON - makes sure an entire JSON object is valid
SELECT IS_VALID_JSON('{"one":1, "two":2}');
IS_VALID_JSON
=============
true
SELECT IS_VALID_JSON('{"one":1, "two":2');
IS_VALID_JSON
=============
false
  • IS_VALID_JSON_ARRAY - makes sure an JSON array is valid.
SELECT IS_VALID_JSON_ARRAY('{"one":1}')
IS_VALID_JSON_ARRAY
===================
false
SELECT IS_VALID_JSON_ARRAY('[1,2,3]')
IS_VALID_JSON_ARRAY
===================
true
Introduction to Redshift

Extracting from a JSON object

  • JSON_EXTRACT_PATH_TEXT returns the value at a json path
  • Takes a JSON string or field then a one or more keys to the path
SELECT JSON_EXTRACT_PATH_TEXT('{"one":1, "two":2}', 'one');
JSON_EXTRACT_PATH_TEXT
======================
1
Introduction to Redshift

Attempting to parse invalid JSON

-- Trying to extract a path from poorly formed JSON
SELECT JSON_EXTRACT_PATH_TEXT('{"one":1, "two":2', 'one');
JSON parsing error
DETAIL:  
  =========================================
  error:  JSON parsing error
  code:      8001
  context:   invalid json object 
  query:     [child_sequence:'one']
  location:  funcs_json.hpp:202
  process:   padbmaster [pid=1073807529]
  =========================================
Introduction to Redshift

Extracting from nested JSON paths

  • Supply multiple key arguments
SELECT JSON_EXTRACT_PATH_TEXT('
         {
           "one_object":{
              "nested_three": 3, 
              "nested_four":4
           }, 
           "two":2
         }', 
         'one_object', 'nested_three');
JSON_EXTRACT_PATH_TEXT
======================
3
Introduction to Redshift

Extracting from nonexistent nested JSON paths

  • Returns NULL
    SELECT JSON_EXTRACT_PATH_TEXT('
           {
             "one_object":{
                "nested_three": 3, 
                "nested_four":4
             }, 
             "two":2
           }', 
           'two', 'nested_five');
    
JSON_EXTRACT_PATH_TEXT
======================
NULL
Introduction to Redshift

Extracting from a JSON array

  • JSON_EXTRACT_ARRAY_ELEMENT_TEXT returns the value at array index
  • Takes a JSON string or field then an integer of the index starting from zero.
SELECT JSON_EXTRACT_ARRAY_ELEMENT_TEXT('[1.1,400,13]', 2);
JSON_EXTRACT_ARRAY_ELEMENT_TEXT
===============================
13
SELECT JSON_EXTRACT_ARRAY_ELEMENT_TEXT('[1.1,400,13]', 3);
JSON_EXTRACT_ARRAY_ELEMENT_TEXT
===============================
NULL
Introduction to Redshift

Extracting from a nested JSON array

SELECT JSON_EXTRACT_ARRAY_ELEMENT_TEXT(
           JSON_EXTRACT_PATH_TEXT('
               {
                 "one":1, 
                 "nested_two":[3,4,5]
               }', 
               -- Extracting the value at nested_two with JSON_EXTRACT_PATH_TEXT
               'nested_two'
            ),
            -- Extracting item at position one of the array with JSON_EXTRACT_ARRAY_ELEMENT_TEXT
            1
       );
JSON_EXTRACT_ARRAY_ELEMENT_TEXT
===============================
4
Introduction to Redshift

Extracting from a nested JSON array shortcut

  • Array index has to be a string
-- Passing two keys ('nested_two', '1') to select
-- the second element of the nested_two keys array value
SELECT JSON_EXTRACT_PATH_TEXT('
          {
            "one":1, 
            "nested_two":[3,4,5]
          }', 
          'nested_two', '1'
       );
JSON_EXTRACT_PATH_TEXT
======================
4
Introduction to Redshift

Casting in CTEs

WITH location_details AS (
    SELECT '{
        "location": "Salmon Challis National Forest",
      }'::SUPER::VARCHAR AS data
) 
  • Accessible as location_details.data
Introduction to Redshift

Let's practice!

Introduction to Redshift

Preparing Video For Download...