Schemi esterni, formati di file e di tabella

Introduzione a Redshift

Jason Myers

Principal Architect

Schemi esterni

Componenti del database

Quando catalogo metadati e storage non sono nel cluster, sono considerati esterni

Redshift Spectrum

Schemi esterni con Redshift Spectrum

Redshift è il motore
Usa AWS Glue Data Catalog e storage AWS S3 per default

Formati file S3

Formato file	Colonnare	Supporta letture parallele
Parquet	Sì	Sì
ORC	Sì	Sì
TextFile	No	No
OpenCSV	No	Sì
JSON	No	No

Creare tabella esterna CSV

CREATE TABLE spectrumdb.IDAHO_SITE_ID
(
    'pk_siteid' INTEGER PRIMARY KEY,
    -- Cutting the rest of columns for space
)

-- CSV rows are comma delimited
ROW FORMAT DELIMITED

-- CSV fields are terminated by a comma
FIELDS TERMINATED BY ','

-- CSVs are a type of text file
STORED AS TEXTFILE

-- This is where the data is in AWS S3
LOCATION 's3://spectrum-id/idaho_sites/'

-- This file has headers that we want to skip
TABLE PROPERTIES ('skip.header.line.count'='1');

Query su tabelle Spectrum

Funziona come interrogare tabelle interne
EXPLAIN sarà diverso
Nessun DISTKEY o SORTKEY da gestire
Pseudocolonne
- $path - mostra il percorso del file per la riga
- $size - mostra la dimensione del file per la riga

Uso delle pseudocolonne

SELECT "$path", 
       "$size",
       pk_siteid
  FROM spectrumdb.idaho_site_id;

$path                           | $size | pk_siteid
================================|=======|==========
's3://spectrum-id/idaho_sites/' | 1616  | 1 
's3://spectrum-id/idaho_sites/' | 1616  | 2 
's3://spectrum-id/idaho_sites/' | 1616  | 3

Formati di tabella

Formati comuni:
- Hive
- Iceberg
- Hudi
- Deltalake
Solo lettura
Alcuni, come Hive, richiedono un catalogo esterno oltre ad AWS Glue

Visualizzare gli schemi esterni

SVV_ALL_SCHEMAS - internal o external

SELECT schema_name, 
       schema_type
  FROM SVV_ALL_SCHEMAS
 ORDER BY SCHEMA_NAME;

schema_name           | schema_type
======================|=============
public_intro_redshift | internal
spectrumdb            | external

Visualizzare le tabelle esterne

SVV_ALL_TABLES - TABLE o EXTERNAL TABLE

SELECT table_name, 
       table_type
  FROM SVV_ALL_TABLES
 WHERE schema_name = 'public_intro_redshift';

table_name                | table_type
==========================|================
coffee_county_weather     | TABLE
idaho_monitoring_location | TABLE
idaho_samples             | TABLE
ecommerce_sales           | EXTERNAL_TABLE

Passiamo alla pratica!

Introduzione a Redshift