Inside the Redshift warehouse

Introduction to Redshift

Jason Myers

Principal Architect

Agenda

  • Redshift cluster internals
    • Node types
    • Leader specific functions
    • Redshift cluster storage
  • Predicates
  • Redshift spectrum
    • Database components
    • External tables
Introduction to Redshift

Redshift cluster architecture

Leader Node

  • Provides connections
  • Builds and distributes query execution plans
  • Can execute entire queries
  • Has exclusive functions

Compute Node

  • Provides data storage
  • Executes code from the leader node on locally stored data

Redshift Cluster

Introduction to Redshift

Leader specific functions

  • Only run on the leader
-- Selecting the substr, starting at 
-- position 11 of 'chocolate chip'
SELECT SUBSTR('chocolate chip', 11);
chip
  • SUBSTR errors on table columns
-- Selecting the substr from position 1
-- of the column named field on table
SELECT SUBSTR(field, 1) FROM table;
ERROR: SUBSTR() function is not 
supported (Hint: use SUBSTRING 
instead)
Introduction to Redshift

Looking at data across nodes

SELECT host, 
       -- Calculate the percentage of used space
       -- using the used minus tossed or ready to be reclaimed
       -- divided by the capacity
       (used - tossed) / capacity * 100 as percent_used 
  FROM STV_PARTITIONS;
 host |  percent_used
======+==============
  0   |  24.9
  1   |  24.8
Introduction to Redshift

Predicates

SELECT table_A.columnX,
       table_B.columnY,
  FROM table_A
       INNER JOIN table_B 
          -- predicate
       ON table_B.foreign_key = table_A.primary_key 
       -- predicate
 WHERE table_B.columnZ = 'value';
  • Typically boolean expressions and found in WHERE, HAVING or ON SQL clauses
Introduction to Redshift

Predicate push-down

PushDown

Introduction to Redshift

Typical internal database components

Database Components

MetaData Catalog

  • Holds schema info (columns, keys, etc)
  • References a storage location

Query Engine

  • Plans and executes queries
  • Provides connections

Storage

  • Hold table data
  • Supports multiple file and table formats
Introduction to Redshift

Redshift spectrum architecture

AWS Glue Data Catalog

  • Stores information about "external" tables

AWS S3 Bucket

  • Stores the files that represent the table
  • Supports CSVs, JSON, Text, Parquet, and many more file types

Redshift Spectrum Architecture

Introduction to Redshift

Let's practice!

Introduction to Redshift

Preparing Video For Download...