Introduction

PostgreSQL Summary Stats and Window Functions

Michel Semaan

Data Scientist

Motivation

USA total and running total of Summer Olympics gold medals since 2004

| Year | Medals | Medals_RT |
|------|--------|-----------|
| 2004 | 116    | 116       |
| 2008 | 125    | 241       |
| 2012 | 147    | 388       |

Discus throw reigning champion status

| Year | Champion | Last_Champion | Reigning_Champion |
|------|----------|---------------|-------------------|
| 1996 | GER      | null          | false             |
| 2000 | LTU      | GER           | false             |
| 2004 | LTU      | LTU           | true              |
| 2008 | EST      | LTU           | false             |
| 2012 | GER      | EST           | false             |
PostgreSQL Summary Stats and Window Functions

Course outline

  1. Introduction to window functions
  2. Fetching, ranking, and paging
  3. Aggregate window functions and frames
  4. Beyond window functions
PostgreSQL Summary Stats and Window Functions

Summer olympics dataset

  • Each row represents a medal awarded in the Summer Olympics games

Columns

  • Year, City
  • Sport, Discipline, Event
  • Athlete, Country, Gender
  • Medal
PostgreSQL Summary Stats and Window Functions

Window functions

  • Perform an operation across a set of rows that are somehow related to the current row
  • Similar to GROUP BY aggregate functions, but all rows remain in the output

Uses

  • Fetching values from preceding or following rows (e.g. fetching the previous row's value)
    • Determining reigning champion status
    • Calculating growth over time
  • Assigning ordinal ranks (1st, 2nd, etc.) to rows based on their values' positions in a sorted list
  • Running totals, moving averages
PostgreSQL Summary Stats and Window Functions

Row numbers

Query

SELECT
  Year, Event, Country
FROM Summer_Medals
WHERE
  Medal = 'Gold';

Result

| Year | Event                      | Country |
|------|----------------------------|---------|
| 1896 | 100M Freestyle             | HUN     |
| 1896 | 100M Freestyle For Sailors | GRE     |
| 1896 | 1200M Freestyle            | HUN     |
| ...  | ...                        | ...     |
PostgreSQL Summary Stats and Window Functions

Enter ROW_NUMBER

Query

SELECT
  Year, Event, Country,
  ROW_NUMBER() OVER () AS Row_N
FROM Summer_Medals
WHERE
  Medal = 'Gold';

Result

| Year | Event                      | Country | Row_N |
|------|----------------------------|---------|-------|
| 1896 | 100M Freestyle             | HUN     | 1     |
| 1896 | 100M Freestyle For Sailors | GRE     | 2     |
| 1896 | 1200M Freestyle            | HUN     | 3     |
| ...  | ...                        | ...     | ...   |
PostgreSQL Summary Stats and Window Functions

Anatomy of a window function

Query

SELECT
  Year, Event, Country,
  ROW_NUMBER() OVER () AS Row_N
FROM Summer_Medals
WHERE
  Medal = 'Gold';
  • FUNCTION_NAME() OVER (...)
    • ORDER BY
    • PARTITION BY
    • ROWS/RANGE PRECEDING/FOLLOWING/UNBOUNDED
PostgreSQL Summary Stats and Window Functions

Let's practice!

PostgreSQL Summary Stats and Window Functions

Preparing Video For Download...