Unstructured data

Introduction to BigQuery

Matt Forrest

Field CTO

Why unstructured data is important

Example with unstructured data:

/*--------------+-----------------------------------------*
 | customer_id  | emails                                  |
 +--------------+-----------------------------------------+
 | 12345        | ['[email protected]', '[email protected]']   |
 *--------------+-----------------------------------------*/
Introduction to BigQuery

ARRAYs

SELECT ARRAY
  (SELECT 'bigquery' UNION ALL
   SELECT 'analytics' UNION ALL
   SELECT 'sql') AS new_array;


/*------------------------------------* | new_array | +------------------------------------+ | ['bigquery', 'analytics', 'sql'] | *-----------------------------------*/
SELECT ARRAY
  (SELECT 'bigquery' UNION ALL
   SELECT 'analytics' UNION ALL
   SELECT 'sql')[1] as result;


/*---------------* | result | +---------------+ | 'analytics' | *---------------*/
Introduction to BigQuery

STRUCTs

SELECT
  STRUCT

<skill string, learning bool> ('big query', true) as skills;
/*--------------+--------------* | skills.skill | learning | +--------------+------------- + | 'big query' | true | *--------------+--------------*/
SELECT
  STRUCT
  <skill string, learning bool>

('big query', true).skill as key;
/*--------------+ | key. | +--------------+ | 'big query' | *--------------*/
Introduction to BigQuery

ARRAY_LENGTH and ARRAY_CONCAT

SELECT
  ARRAY_LENGTH(
    [
      '[email protected]', 
      '[email protected]'
    ]) as len;

/*------*
 | len  |
 +------+
 | 2    |
 *------*/     
SELECT
  ARRAY_CONCAT(
    ['one'], ['two']) as new_array;

/*-------------------*
 | new_array         |
 +-------------------+
 | ['one', 'two']    |
 *-------------------*/     
Introduction to BigQuery

UNNEST

Unnest-ing our email data:

SELECT 
    * 
FROM
    UNNEST(['[email protected]', '[email protected]']) as emails;

/*-------------------+
 | emails            |
 +-------------------+
 | '[email protected]' |
 | '[email protected]'  |
 *-------------------*/
Introduction to BigQuery

UNNEST with STRUCTs

Example data - STRUCT inside an ARRAY:

[
    {'big query': true},
    {'sql': true}
]
SELECT 
    my_skills.skill, 
    my_skills.learned
FROM 
    UNNEST(skills) as my_skills

/*--------------+--------------* | skill | learned | +--------------+------------- + | 'big query' | true | +--------------+------------- + | 'sql' | true | *--------------+--------------*/
Introduction to BigQuery

SEARCH

Using search with email data:

SELECT 
    SEARCH(['[email protected]', '[email protected]'], 'gmail.com') as results;

/*----------*
 | results  |
 +----------+
 | true     |
 *----------*/
Introduction to BigQuery

Unstructured data cheat sheet

ARRAYs

  • Similar to lists with ordered values
  • Values can accessed via a base 0 index (e.g., my_array[0])

STRUCTs

  • Similar to JSON or a dictionary
  • Can have any structure with multiple data types
  • Structure must be the same for all rows in that column

ARRAY_CONCAT/ARRAY_LENGTH

  • Concatenate two or more arrays and measure an array length

UNNEST

  • Allows you to flatten ARRAY data

SEARCH

  • Search across ARRAYs or STRUCTs to find matching values
Introduction to BigQuery

Let's practice!

Introduction to BigQuery

Preparing Video For Download...