Introduction to Redshift
Jason Myers
Principal Architect
Database Components
Redshift Spectrum
File format | Columnar | Supports parallel reads |
---|---|---|
Parquet | Yes | Yes |
ORC | Yes | Yes |
TextFile | No | No |
OpenCSV | No | Yes |
JSON | No | No |
CREATE TABLE spectrumdb.IDAHO_SITE_ID ( 'pk_siteid' INTEGER PRIMARY KEY, -- Cutting the rest of columns for space )
-- CSV rows are comma delimited ROW FORMAT DELIMITED
-- CSV fields are terminated by a comma FIELDS TERMINATED BY ','
-- CSVs are a type of text file STORED AS TEXTFILE
-- This is where the data is in AWS S3 LOCATION 's3://spectrum-id/idaho_sites/'
-- This file has headers that we want to skip TABLE PROPERTIES ('skip.header.line.count'='1');
EXPLAIN
will look differentDISTKEY
or SORTKEY
s$path
- shows the file storage path for the row$size
- shows the file size for the rowSELECT "$path",
"$size",
pk_siteid
FROM spectrumdb.idaho_site_id;
$path | $size | pk_siteid
================================|=======|==========
's3://spectrum-id/idaho_sites/' | 1616 | 1
's3://spectrum-id/idaho_sites/' | 1616 | 2
's3://spectrum-id/idaho_sites/' | 1616 | 3
Common Formats:
Read only
internal
or external
SELECT schema_name,
schema_type
FROM SVV_ALL_SCHEMAS
ORDER BY SCHEMA_NAME;
schema_name | schema_type
======================|=============
public_intro_redshift | internal
spectrumdb | external
TABLE
or EXTERNAL TABLE
SELECT table_name,
table_type
FROM SVV_ALL_TABLES
WHERE schema_name = 'public_intro_redshift';
table_name | table_type
==========================|================
coffee_county_weather | TABLE
idaho_monitoring_location | TABLE
idaho_samples | TABLE
ecommerce_sales | EXTERNAL_TABLE
Introduction to Redshift