Using column-oriented storage

Improving Query Performance in PostgreSQL

Amy McCarty

Instructor

Column-oriented

Column-oriented storage

  • Relation between rows retained
id name species age habitat received
01 Bob panda 2 Asia 2018
02 Sunny zebra 3 Africa 2018
03 Beco zebra 10 Africa 2017
04 Coco koala 5 Australia 2016

Stored as

Two tables. The first table is a list of ids and species. 01 - panda. 02 - zebra. 03 - zebra. 04 - koala. The second table is a list of ids and ages. 01 - 2. 02 - 3. 03 - 10. 04 - 5.

Improving Query Performance in PostgreSQL

Analytics focus - a good fit

Column-oriented storage properties

  • One column stored in same location
  • Quick to return all rows
  • Fast to perform column calculations

Analytics focus

  • Counts, averages, calculations
  • Reporting
  • Column aggregations

Stored as

Three tables. The first table is a list of ids and names. 01 - Bob. 02 - Sunny. 03- Beco. 04 - Coco. The second table is a list of ids and species. 01 - panda. 02 - zebra. 03 - zebra. 04 - koala. The third table is a list of ids and ages. 01 - 2. 02 - 3. 03 - 10. 04 - 5.

Improving Query Performance in PostgreSQL

Transactional focus - a poor fit

Row relationships retained

  • Slow to return all columns
  • Slow to load data

Transactional focus

  • Fast insert and delete of records

Stored as

Three tables. The first table is a list of ids and names. 01 - Bob. 02 - Sunny. 03- Beco. 04 - Coco. The second table is a list of ids and species. 01 - panda. 02 - zebra. 03 - zebra. 04 - koala. The third table is a list of ids and ages. 01 - 2. 02 - 3. 03 - 10. 04 - 5.

Improving Query Performance in PostgreSQL

Database examples

 

Postgres Citus Data, Greenplum, Amazon Redshift
MySQL MariaDB
Oracle Oracle In-Memory Cloud Store
Clickhouse, Apache Druid, CrateDB
Improving Query Performance in PostgreSQL

Information schema

Reducing the columns

  • Use SELECT * sparingly
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_catalog = 'schema_name'
AND table_name = 'zoo_animals'
column_name data_type
id integer
name text
species text
Improving Query Performance in PostgreSQL

Information schema

Reducing the columns

  • Use SELECT * sparingly
  • Use the information schema
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_catalog = 'schema_name'
AND table_name = 'zoo_animals'
column_name data_type
id integer
name text
species text
Improving Query Performance in PostgreSQL

Writing your queries

 

  • Examine each column in own query
id name species age habitat received
01 Bob panda 2 Asia 2018
02 Sunny zebra 3 Africa 2018
03 Beco zebra 10 Africa 2017
04 Coco koala 5 Australia 2016
-- Structure for column oriented
SELECT MIN(age), MAX(age)
FROM zoo_animals
WHERE species = 'zebra'
-- Structure for row-oriented
SELECT *
FROM zoo_animals
WHERE species = 'zebra'
ORDER BY age
Improving Query Performance in PostgreSQL

Let's practice!

Improving Query Performance in PostgreSQL

Preparing Video For Download...