Time Series Analysis in PostgreSQL
Jasmin Ludolf
Content Developer, DataCamp
temperatures_monthly tablestation_id - weather station idyear_month - year and montht_monthly_min - monthly minimum temperature in Celsiust_monthly_max - monthly maximum temperature in Celsiust_monthly_avg - monthly average temperature in CelsiusSELECT station_id, year_month, t_monthly_min FROM temperatures_monthly AS tm JOIN( SELECT station_id, min(t_monthly_min) AS t_monthly_min FROM temperatures_monthly WHERE year_month BETWEEN '2018-01-01' AND '2018-12-31' GROUP BY station_id ) as p USING(station_id, t_monthly_min) WHERE year_month BETWEEN '2018-01-01' AND '2018-12-31' ORDER BY station_id, t_monthly_min;
|station_id|year_month|t_monthly_min|
|----------|----------|-------------|
| 1|2018-12-01| 5.4|
| 2|2018-12-01| 6.6|
| 3|2018-01-01| 1.6|
| 3|2018-02-01| 1.6|
| 4|2018-02-01| -19.0|
| 6|2018-01-01| 2.6|
| 8|2018-02-01| -16.3|
Problems:
5 is missing, it had no data for 20183, two rows had the same minimum temp 3ROW_NUMBER() : numbers the rows per partition, in ascending order
SELECT station_id, year_month, t_monthly_min,
ROW_NUMBER() OVER (PARTITION BY station_id ORDER BY t_monthly_min) AS rank
FROM temperatures_monthly AS tm
WHERE year_month BETWEEN '2018-01-01' AND '2018-12-31'
ORDER BY station_id, t_monthly_min;
|station_id|year_month|t_monthly_min|rank|
|----------|----------|-------------|----|
| 1|2018-12-01| 5.4| 1|
| 1|2018-02-01| 6.2| 2|
| 1|2018-01-01| 7.4| 3|
| 1|2018-11-01| 9.0| 4|
| 1|2018-03-01| 10.6| 5|
| 1|2018-04-01| 14.7| 6|
| 1|2018-10-01| 16.6| 7|
| 1|2018-05-01| 17.2| 8|
| 1|2018-06-01| 21.9| 9|
| 1|2018-09-01| 24.4| 10|
| 1|2018-07-01| 28.1| 11|
| 1|2018-08-01| 28.1| 12|
| 2|2018-12-01| 6.6| 1|
| 2|2018-01-01| 7.6| 2|
| 2|2018-02-01| 8.1| 3|
2station_idSELECT * FROM
(
SELECT station_id, year_month, t_monthly_min,
ROW_NUMBER() OVER
(PARTITION BY station_id ORDER BY t_monthly_min) AS rank
FROM temperatures_monthly
WHERE year_month BETWEEN '2018-01-01' AND '2018-12-31'
) AS q
WHERE rank < 3
ORDER BY station_id, rank;
|station_id|year_month|t_monthly_min|rank|
|----------|----------|-------------|----|
| 1|2018-12-01| 5.4| 1|
| 1|2018-02-01| 6.2| 2|
| 2|2018-12-01| 6.6| 1|
| 2|2018-01-01| 7.6| 2|
| 3|2018-01-01| 1.6| 1|
| 3|2018-02-01| 1.6| 2|
DESC:ROW_NUMBER() OVER (PARTITION BY station_id ORDER BY t_monthly_min DESC)
Time Series Analysis in PostgreSQL