Paging

PostgreSQL Summary Stats and Window Functions

Michel Semaan

Data Scientist

What is paging?

  • Paging: Splitting data into (approximately) equal chunks
  • Uses
    • Many APIs return data in "pages" to reduce data being sent
    • Separating data into quartiles or thirds (top middle 33%, and bottom thirds) to judge performance

Enter NTILE

  • NTILE(n) splits the data into n approximately equal pages
PostgreSQL Summary Stats and Window Functions

Paging - Source table

Query

SELECT
  DISTINCT Discipline
FROM Summer_Medals;
  • Split the data into 15 approx. equally sized pages
  • $67 / 15 \simeq 4$, so each each page will contain four or five rows

Result

| Discipline          |
|---------------------|
| Wrestling Freestyle |
| Archery             |
| Baseball            |
| Lacrosse            |
| Judo                |
| Athletics           |
| ...                 |

(67 rows)
PostgreSQL Summary Stats and Window Functions

Paging

Query

WITH Disciplines AS (
  SELECT
    DISTINCT Discipline
  FROM Summer_Medals)

SELECT
  Discipline, NTILE(15) OVER () AS Page
From Disciplines
ORDER BY Page ASC;

Result

| Discipline          | Page |
|---------------------|------|
| Wrestling Freestyle | 1    |
| Archery             | 1    |
| Baseball            | 1    |
| Lacrosse            | 1    |
| Judo                | 1    |
| Athletics           | 2    |
| ...                 | ...  |
PostgreSQL Summary Stats and Window Functions

Top, middle, and bottom thirds

Query

WITH Country_Medals AS (
  SELECT
    Country, COUNT(*) AS Medals
  FROM Summer_Medals
  GROUP BY Country),

SELECT
  Country, Medals,
  NTILE(3) OVER (ORDER BY Medals DESC) AS Third
FROM Country_Medals;

Result

| Country | Medals | Third |
|---------|--------|-------|
| USA     | 4585   | 1     |
| URS     | 2049   | 1     |
| GBR     | 1720   | 1     |
| ...     | ...    | ...   |
| CZE     | 56     | 2     |
| LTU     | 55     | 2     |
| ...     | ...    | ...   |
| DOM     | 6      | 3     |
| BWI     | 5      | 3     |
| ...     | ...    | ...   |
PostgreSQL Summary Stats and Window Functions

Thirds averages

Query

WITH Country_Medals AS (...),

  Thirds AS (
  SELECT
    Country, Medals,
    NTILE(3) OVER (ORDER BY Medals DESC) AS Third
  FROM Country_Medals)

SELECT
  Third,
  ROUND(AVG(Medals), 2) AS Avg_Medals
FROM Thirds
GROUP BY Third
ORDER BY Third ASC;

Result

| Third | Avg_Medals |
|-------|------------|
| 1     | 598.74     |
| 2     | 22.98      |
| 3     | 2.08       |
PostgreSQL Summary Stats and Window Functions

Let's practice!

PostgreSQL Summary Stats and Window Functions

Preparing Video For Download...