External Schemas, File, and Table Formats

Introduction to Redshift

Jason Myers

Principal Architect

External schemas

Database Components

When metadata catalog and storage are not part of the cluster it is considered external

Redshift Spectrum

External Schemas with Redshift Spectrum

Redshift is the engine
Uses AWS Glue Data Catalog and AWS S3 storage by default

S3 data file formats

File format	Columnar	Supports parallel reads
Parquet	Yes	Yes
ORC	Yes	Yes
TextFile	No	No
OpenCSV	No	Yes
JSON	No	No

Create CSV External Table

CREATE TABLE spectrumdb.IDAHO_SITE_ID
(
    'pk_siteid' INTEGER PRIMARY KEY,
    -- Cutting the rest of columns for space
)

-- CSV rows are comma delimited
ROW FORMAT DELIMITED

-- CSV fields are terminated by a comma
FIELDS TERMINATED BY ','

-- CSVs are a type of text file
STORED AS TEXTFILE

-- This is where the data is in AWS S3
LOCATION 's3://spectrum-id/idaho_sites/'

-- This file has headers that we want to skip
TABLE PROPERTIES ('skip.header.line.count'='1');

Querying spectrum tables

Works just like querying internal tables
EXPLAIN will look different
No concern for DISTKEY or SORTKEYs
Pseudocolumns
- $path - shows the file storage path for the row
- $size - shows the file size for the row

Using pseudocolumns

SELECT "$path", 
       "$size",
       pk_siteid
  FROM spectrumdb.idaho_site_id;

$path                           | $size | pk_siteid
================================|=======|==========
's3://spectrum-id/idaho_sites/' | 1616  | 1 
's3://spectrum-id/idaho_sites/' | 1616  | 2 
's3://spectrum-id/idaho_sites/' | 1616  | 3

Table formats

Common Formats:
- Hive
- Iceberg
- Hudi
- Deltalake
Read only
Some like Hive need an external catalog other than AWS Glue

Viewing external schemas

SVV_ALL_SCHEMAS - internal or external

SELECT schema_name, 
       schema_type
  FROM SVV_ALL_SCHEMAS
 ORDER BY SCHEMA_NAME;

schema_name           | schema_type
======================|=============
public_intro_redshift | internal
spectrumdb            | external

Viewing external tables

SVV_ALL_TABLES - TABLE or EXTERNAL TABLE

SELECT table_name, 
       table_type
  FROM SVV_ALL_TABLES
 WHERE schema_name = 'public_intro_redshift';

table_name                | table_type
==========================|================
coffee_county_weather     | TABLE
idaho_monitoring_location | TABLE
idaho_samples             | TABLE
ecommerce_sales           | EXTERNAL_TABLE

Let's practice!

Introduction to Redshift