Introduction to Redshift
Jason Myers
Principal Architect
Database Components

Redshift Spectrum

| File format | Columnar | Supports parallel reads |
|---|---|---|
| Parquet | Yes | Yes |
| ORC | Yes | Yes |
| TextFile | No | No |
| OpenCSV | No | Yes |
| JSON | No | No |
CREATE TABLE spectrumdb.IDAHO_SITE_ID ( 'pk_siteid' INTEGER PRIMARY KEY, -- Cutting the rest of columns for space )-- CSV rows are comma delimited ROW FORMAT DELIMITED-- CSV fields are terminated by a comma FIELDS TERMINATED BY ','-- CSVs are a type of text file STORED AS TEXTFILE-- This is where the data is in AWS S3 LOCATION 's3://spectrum-id/idaho_sites/'-- This file has headers that we want to skip TABLE PROPERTIES ('skip.header.line.count'='1');
EXPLAIN will look differentDISTKEY or SORTKEYs$path - shows the file storage path for the row$size - shows the file size for the rowSELECT "$path",
"$size",
pk_siteid
FROM spectrumdb.idaho_site_id;
$path | $size | pk_siteid
================================|=======|==========
's3://spectrum-id/idaho_sites/' | 1616 | 1
's3://spectrum-id/idaho_sites/' | 1616 | 2
's3://spectrum-id/idaho_sites/' | 1616 | 3
Common Formats:
Read only
internal or externalSELECT schema_name,
schema_type
FROM SVV_ALL_SCHEMAS
ORDER BY SCHEMA_NAME;
schema_name | schema_type
======================|=============
public_intro_redshift | internal
spectrumdb | external
TABLE or EXTERNAL TABLESELECT table_name,
table_type
FROM SVV_ALL_TABLES
WHERE schema_name = 'public_intro_redshift';
table_name | table_type
==========================|================
coffee_county_weather | TABLE
idaho_monitoring_location | TABLE
idaho_samples | TABLE
ecommerce_sales | EXTERNAL_TABLE
Introduction to Redshift