Time Series Analysis in PostgreSQL
Jasmin Ludolf
Content Developer, DataCamp
temperatures_monthly tablestation_id - weather station idyear_month - year and montht_monthly_min - monthly minimum temperature in Celsiust_monthly_max - monthly maximum temperature in Celsiust_monthly_avg - monthly average temperature in CelsiusSELECT station_id, year_month, t_monthly_min FROM temperatures_monthly AS tm JOIN( SELECT station_id, min(t_monthly_min) AS t_monthly_min FROM temperatures_monthly WHERE year_month BETWEEN '2018-01-01' AND '2018-12-31' GROUP BY station_id ) as p USING(station_id, t_monthly_min) WHERE year_month BETWEEN '2018-01-01' AND '2018-12-31' ORDER BY station_id, t_monthly_min;
|station_id|year_month|t_monthly_min|
|----------|----------|-------------|
|         1|2018-12-01|          5.4|
|         2|2018-12-01|          6.6|
|         3|2018-01-01|          1.6|
|         3|2018-02-01|          1.6|
|         4|2018-02-01|        -19.0|
|         6|2018-01-01|          2.6|
|         8|2018-02-01|        -16.3|
Problems:
5 is missing, it had no data for 20183, two rows had the same minimum temp 3ROW_NUMBER() : numbers the rows per partition, in ascending order
SELECT station_id, year_month, t_monthly_min,
    ROW_NUMBER() OVER (PARTITION BY station_id ORDER BY t_monthly_min) AS rank
FROM temperatures_monthly AS tm
WHERE year_month BETWEEN '2018-01-01' AND '2018-12-31'
ORDER BY station_id, t_monthly_min;
|station_id|year_month|t_monthly_min|rank|
|----------|----------|-------------|----|
|         1|2018-12-01|          5.4|   1|
|         1|2018-02-01|          6.2|   2|
|         1|2018-01-01|          7.4|   3|
|         1|2018-11-01|          9.0|   4|
|         1|2018-03-01|         10.6|   5| 
|         1|2018-04-01|         14.7|   6|
|         1|2018-10-01|         16.6|   7|
|         1|2018-05-01|         17.2|   8|
|         1|2018-06-01|         21.9|   9|
|         1|2018-09-01|         24.4|  10|
|         1|2018-07-01|         28.1|  11|
|         1|2018-08-01|         28.1|  12|
|         2|2018-12-01|          6.6|   1|
|         2|2018-01-01|          7.6|   2|
|         2|2018-02-01|          8.1|   3|
2station_idSELECT * FROM
(
    SELECT station_id, year_month, t_monthly_min,
    ROW_NUMBER() OVER 
          (PARTITION BY station_id ORDER BY t_monthly_min) AS rank
    FROM temperatures_monthly
    WHERE year_month BETWEEN '2018-01-01' AND '2018-12-31'
) AS q
WHERE rank < 3
ORDER BY station_id, rank;
|station_id|year_month|t_monthly_min|rank|
|----------|----------|-------------|----|
|         1|2018-12-01|          5.4|   1|
|         1|2018-02-01|          6.2|   2|
|         2|2018-12-01|          6.6|   1|
|         2|2018-01-01|          7.6|   2|
|         3|2018-01-01|          1.6|   1|
|         3|2018-02-01|          1.6|   2|
DESC:ROW_NUMBER() OVER (PARTITION BY station_id ORDER BY t_monthly_min DESC)
Time Series Analysis in PostgreSQL