Row vs. column data store

Data Warehousing Concepts

Aaren Stubberfield

Data Scientist

Why is it important?

  • Optimizing queries for speed
  • Column store format for data warehouse tables is best for analytic workloads
Data Warehousing Concepts

Basics of computer storage

  • Computers store data in blocks.
  • Reads the required blocks when retrieving data.
  • Reading fewer blocks increases the overall speed of the process.

Symbolic image of a hard disk with multiple blocks for storage

Data Warehousing Concepts

Example of health table

  • CDC (Centers for Disease Control and Prevention)
  • Flu infection data by age groups over multiple seasons
SEASON AGE GROUP HOSPITALIZATION PERCENTAGE
2019 0-17 yr 13.9%
2019 18-49 yr 22.5%
2019 50+ yr 63.7%
2020 0-17 yr 3.9%
2020 18-49 yr 18.1%
2020 50+ yr 78%
2021 0-17 yr 15.6%
2021 18-49 yr 23.3%
2021 50+ yr 61.1%
Data Warehousing Concepts

Row store example

SEASON AGE GROUP HOSPITALIZATION PERCENTAGE
2019 0-17 yr 13.9%
2019 18-49 yr 22.5%
2019 50+ yr 63.7%
2020 0-17 yr 3.9%
2020 18-49 yr 18.1%
2020 50+ yr 78%
2021 0-17 yr 15.6%
2021 18-49 yr 23.3%
2021 50+ yr 61.1%

Symbolic image of a hard disk with multiple blocks for storage with a row stored in each block

Data Warehousing Concepts

Column store example

SEASON AGE GROUP HOSPITALIZATION PERCENTAGE
2019 0-17 yr 13.9%
2019 18-49 yr 22.5%
2019 50+ yr 63.7%
2020 0-17 yr 3.9%
2020 18-49 yr 18.1%
2020 50+ yr 78%
2021 0-17 yr 15.6%
2021 18-49 yr 23.3%
2021 50+ yr 61.1%

Symbolic image of a hard disk with multiple blocks for storage with a column stored in each block

Data Warehousing Concepts

Summary

Row Store

  • Row data is stored together in blocks
  • Ideal for transactional workloads

Column Store

  • Column data is stored together in blocks
  • Ideal for analytical workloads
  • Better data compression
Data Warehousing Concepts

It's practice time!

Data Warehousing Concepts

Preparing Video For Download...