Ranking functions

Time Series Analysis in PostgreSQL

Jasmin Ludolf

Content Developer, DataCamp

Row number gives sequential numbering

SELECT
    station_id,
    year_month,
    t_monthly_min,
    ROW_NUMBER() OVER (
      PARTITION BY station_id
      ORDER BY t_monthly_min) AS rank
FROM temperatures_monthly AS tm
WHERE year_month BETWEEN '2018-01-01'
    AND '2018-12-31'
ORDER BY station_id, t_monthly_min;
|station_id|year_month|t_monthly_min|rank|
|----------|----------|-------------|----|
|         1|2018-12-01|          5.4|   1|
|         1|2018-02-01|          6.2|   2|
|         1|2018-01-01|          7.4|   3|
|         1|2018-11-01|          9.0|   4|
|         1|2018-03-01|         10.6|   5| 
|         1|2018-04-01|         14.7|   6|
|         1|2018-10-01|         16.6|   7|
|         1|2018-05-01|         17.2|   8|
|         1|2018-06-01|         21.9|   9|
|         1|2018-09-01|         24.4|  10|
|         1|2018-07-01|         28.1|  11|
...
Time Series Analysis in PostgreSQL

Rank

  • RANK() : assigns a rank to each row within each partition according to ORDER BY, starting with 1, repeats rank for similar values
...
RANK() OVER (
        PARTITION BY station_id 
        ORDER BY t_monthly_min DESC) AS rank
...
...
ROW_NUMBER() OVER (
        PARTITION BY station_id 
        ORDER BY t_monthly_min DESC) AS row
...
Time Series Analysis in PostgreSQL

Rank vs. row

RANK()

|year_month|t_monthly_min|rank|
|----------|-------------|----|
|2018-02-01|          1.6|   1|
|2018-01-01|          1.6|   1|
|2018-03-01|            2|   3|
|2018-11-01|          2.5|   4|
|2018-12-01|          2.7|   5|
|2018-04-01|          3.4|   6|
|2018-10-01|          4.1|   7|
|2018-09-01|          4.7|   8|
...

ROW_NUMBER()

|year_month|t_monthly_min|rank|
|----------|-------------|----|
|2018-02-01|          1.6|   1|
|2018-01-01|          1.6|   2|
|2018-03-01|            2|   3|
|2018-11-01|          2.5|   4|
|2018-12-01|          2.7|   5|
|2018-04-01|          3.4|   6|
|2018-10-01|          4.1|   7|
|2018-09-01|          4.7|   8|
...
Time Series Analysis in PostgreSQL

Dense rank

  • DENSE_RANK() : assigns a rank to each row within each partition according to ORDER BY, starting with 1, repeats rank for similar values
    • Doesn't skip a rank
...
DENSE_RANK() OVER (
    PARTITION BY station_id 
    ORDER BY t_monthly_min DESC) AS rank
...
Time Series Analysis in PostgreSQL

Rank vs. dense rank

RANK()

|year_month|t_monthly_min|rank|
|----------|-------------|----|
|2018-02-01|          1.6|   1|
|2018-01-01|          1.6|   1|
|2018-03-01|            2|   3|
|2018-11-01|          2.5|   4|
|2018-12-01|          2.7|   5|
|2018-04-01|          3.4|   6|
|2018-10-01|          4.1|   7|
|2018-09-01|          4.7|   8|
...

DENSE_RANK()

|year_month|t_monthly_min|rank|
|----------|-------------|----|
|2018-02-01|          1.6|   1|
|2018-01-01|          1.6|   1|
|2018-03-01|            2|   2|
|2018-11-01|          2.5|   3|
|2018-12-01|          2.7|   4|
|2018-04-01|          3.4|   5|
|2018-10-01|          4.1|   6|
|2018-09-01|          4.7|   7|
...
Time Series Analysis in PostgreSQL

Percent rank

  • PERCENT_RANK : assigns a rank to each row within each partition according to ORDER BY, as a percentage
    • (rank - 1) / (total partition rows - 1)
    • Float values from 0 to 1
...
PERCENT_RANK() OVER (
    PARTITION BY station_id 
    ORDER BY t_monthly_min DESC) AS percent_rank
...
Time Series Analysis in PostgreSQL

Percent rank output

|year_month|t_monthly_min|percent_rank|
|----------|-------------|------------|
|2018-02-01|          1.6|           0|
|2018-01-01|          1.6|           0|
|2018-03-01|            2|        0.18|
|2018-11-01|          2.5|        0.27|
|2018-12-01|          2.7|        0.36|
|2018-04-01|          3.4|        0.45|
|2018-10-01|          4.1|        0.54|
|2018-09-01|          4.7|        0.63|
...
Time Series Analysis in PostgreSQL

Let's practice!

Time Series Analysis in PostgreSQL

Preparing Video For Download...