A survey of useful functions

PostgreSQL Summary Stats and Window Functions

Michel Semaan

Data Scientist

Nulls ahoy

Query

SELECT
  Country, Medal, COUNT(*) AS Awards
FROM summer_medals
WHERE
  Year = 2008 AND Country IN ('CHN', 'RUS')
GROUP BY ROLLUP(Country, Medal)
ORDER BY Country ASC, Medal ASC;
  • nulls signify group totals

Result

| Country | Medal  | Awards |
|---------|--------|--------|
| CHN     | Bronze | 57     |
| CHN     | Gold   | 74     |
| CHN     | Silver | 53     |
| CHN     | null   | 184    |
| RUS     | Bronze | 56     |
| RUS     | Gold   | 43     |
| RUS     | Silver | 44     |
| RUS     | null   | 143    |
| null    | null   | 327    |
PostgreSQL Summary Stats and Window Functions

Enter COALESCE

  • COALESCE() takes a list of values and returns the first non-null value, going from left to right
  • COALESCE(null, null, 1, null, 2) ? 1
  • Useful when using SQL operations that return nulls
    • ROLLUP and CUBE
    • Pivoting
    • LAG and LEAD
PostgreSQL Summary Stats and Window Functions

Annihilating nulls

Query

SELECT
  COALESCE(Country, 'Both countries') AS Country,
  COALESCE(Medal, 'All medals') AS Medal,
  COUNT(*) AS Awards
FROM summer_medals
WHERE
  Year = 2008 AND Country IN ('CHN', 'RUS')
GROUP BY ROLLUP(Country, Medal)
ORDER BY Country ASC, Medal ASC;

Result

| Country        | Medal      | Awards |
|----------------|------------|--------|
| Both countries | All medals | 327    |
| CHN            | All medals | 184    |
| CHN            | Bronze     | 57     |
| CHN            | Gold       | 74     |
| CHN            | Silver     | 53     |
| RUS            | All medals | 143    |
| RUS            | Bronze     | 56     |
| RUS            | Gold       | 43     |
| RUS            | Silver     | 44     |
PostgreSQL Summary Stats and Window Functions

Compressing data

Before

| Country | Rank |
|---------|------|
| CHN     | 1    |
| RUS     | 2    |
| USA     | 3    |
  • Rank is redundant because the ranking is implied

After

CHN, RUS, USA
  • Succinct and provides all information needed because the ranking is implied
PostgreSQL Summary Stats and Window Functions

Enter STRING_AGG

  • STRING_AGG(column, separator) takes all the values of a column and concatenates them, with separator in between each value

STRING_AGG(Letter, ', ') transforms this...

| Letter |
|--------|
| A      |
| B      |
| C      |

...into this

A, B, C
PostgreSQL Summary Stats and Window Functions

Query and result

Before

WITH Country_Medals AS (
  SELECT
    Country, COUNT(*) AS Medals
  FROM Summer_Medals
  WHERE Year = 2012
    AND Country IN ('CHN', 'RUS', 'USA')
    AND Medal = 'Gold'
    AND Sport = 'Gymnastics'
  GROUP BY Country),

  SELECT
    Country,
    RANK() OVER (ORDER BY Medals DESC) AS Rank
  FROM Country_Medals
  ORDER BY Rank ASC;

After

WITH Country_Medals AS (...),

  Country_Ranks AS (...)

  SELECT STRING_AGG(Country, ', ')
  FROM Country_Medals;

Result

CHN, RUS, USA
PostgreSQL Summary Stats and Window Functions

Let's practice!

PostgreSQL Summary Stats and Window Functions

Preparing Video For Download...