Ongestructureerde data

Introductie tot BigQuery

Matt Forrest

Field CTO

Waarom ongestructureerde data belangrijk is

Voorbeeld met ongestructureerde data:

/*--------------+-----------------------------------------*
 | customer_id  | emails                                  |
 +--------------+-----------------------------------------+
 | 12345        | ['[email protected]', '[email protected]']   |
 *--------------+-----------------------------------------*/
Introductie tot BigQuery

ARRAY's

SELECT ARRAY
  (SELECT 'bigquery' UNION ALL
   SELECT 'analytics' UNION ALL
   SELECT 'sql') AS new_array;


/*------------------------------------* | new_array | +------------------------------------+ | ['bigquery', 'analytics', 'sql'] | *-----------------------------------*/
SELECT ARRAY
  (SELECT 'bigquery' UNION ALL
   SELECT 'analytics' UNION ALL
   SELECT 'sql')[1] as result;


/*---------------* | result | +---------------+ | 'analytics' | *---------------*/
Introductie tot BigQuery

STRUCTs

SELECT
  STRUCT

<skill string, learning bool> ('big query', true) as skills;
/*--------------+--------------* | skills.skill | learning | +--------------+------------- + | 'big query' | true | *--------------+--------------*/
SELECT
  STRUCT
  <skill string, learning bool>

('big query', true).skill as key;
/*--------------+ | key. | +--------------+ | 'big query' | *--------------*/
Introductie tot BigQuery

ARRAY_LENGTH en ARRAY_CONCAT

SELECT
  ARRAY_LENGTH(
    [
      '[email protected]', 
      '[email protected]'
    ]) as len;

/*------*
 | len  |
 +------+
 | 2    |
 *------*/     
SELECT
  ARRAY_CONCAT(
    ['one'], ['two']) as new_array;

/*-------------------*
 | new_array         |
 +-------------------+
 | ['one', 'two']    |
 *-------------------*/     
Introductie tot BigQuery

UNNEST

Onze e-maildata unnest-en:

SELECT 
    * 
FROM
    UNNEST(['[email protected]', '[email protected]']) as emails;

/*-------------------+
 | emails            |
 +-------------------+
 | '[email protected]' |
 | '[email protected]'  |
 *-------------------*/
Introductie tot BigQuery

UNNEST met STRUCTs

Voorbeelddata - STRUCT in een ARRAY:

[
    {'big query': true},
    {'sql': true}
]
SELECT 
    my_skills.skill, 
    my_skills.learned
FROM 
    UNNEST(skills) as my_skills

/*--------------+--------------* | skill | learned | +--------------+------------- + | 'big query' | true | +--------------+------------- + | 'sql' | true | *--------------+--------------*/
Introductie tot BigQuery

SEARCH

Zoekopdracht met e-maildata:

SELECT 
    SEARCH(['[email protected]', '[email protected]'], 'gmail.com') as results;

/*----------*
 | results  |
 +----------+
 | true     |
 *----------*/
Introductie tot BigQuery

Cheat sheet ongestructureerde data

ARRAY's

  • Net als lijsten met geordende waarden
  • Waarden via 0-based index (bijv. my_array[0])

STRUCTs

  • Lijkt op JSON of een dictionary
  • Kan meerdere datatypen bevatten
  • Structuur is per kolomrij hetzelfde

ARRAY_CONCAT/ARRAY_LENGTH

  • Concateneer 2+ arrays en meet de lengte

UNNEST

  • Maakt flatten van ARRAY-data mogelijk

SEARCH

  • Zoek in ARRAY's of STRUCTs naar matches
Introductie tot BigQuery

Laten we oefenen!

Introductie tot BigQuery

Preparing Video For Download...