Time Series Analysis in PostgreSQL
Jasmin Ludolf
Content Developer, DataCamp
temperatures_monthly
tablestation_id
- weather station idyear_month
- year and montht_monthly_min
- monthly minimum temperature in Celsiust_monthly_max
- monthly maximum temperature in Celsiust_monthly_avg
- monthly average temperature in CelsiusSELECT station_id, year_month, t_monthly_min FROM temperatures_monthly AS tm JOIN
( SELECT station_id, min(t_monthly_min) AS t_monthly_min FROM temperatures_monthly WHERE year_month BETWEEN '2018-01-01' AND '2018-12-31' GROUP BY station_id ) as p USING(station_id, t_monthly_min) WHERE year_month BETWEEN '2018-01-01' AND '2018-12-31' ORDER BY station_id, t_monthly_min;
|station_id|year_month|t_monthly_min|
|----------|----------|-------------|
| 1|2018-12-01| 5.4|
| 2|2018-12-01| 6.6|
| 3|2018-01-01| 1.6|
| 3|2018-02-01| 1.6|
| 4|2018-02-01| -19.0|
| 6|2018-01-01| 2.6|
| 8|2018-02-01| -16.3|
Problems:
5
is missing, it had no data for 20183
, two rows had the same minimum temp 3
ROW_NUMBER()
: numbers the rows per partition, in ascending order
SELECT station_id, year_month, t_monthly_min,
ROW_NUMBER() OVER (PARTITION BY station_id ORDER BY t_monthly_min) AS rank
FROM temperatures_monthly AS tm
WHERE year_month BETWEEN '2018-01-01' AND '2018-12-31'
ORDER BY station_id, t_monthly_min;
|station_id|year_month|t_monthly_min|rank|
|----------|----------|-------------|----|
| 1|2018-12-01| 5.4| 1|
| 1|2018-02-01| 6.2| 2|
| 1|2018-01-01| 7.4| 3|
| 1|2018-11-01| 9.0| 4|
| 1|2018-03-01| 10.6| 5|
| 1|2018-04-01| 14.7| 6|
| 1|2018-10-01| 16.6| 7|
| 1|2018-05-01| 17.2| 8|
| 1|2018-06-01| 21.9| 9|
| 1|2018-09-01| 24.4| 10|
| 1|2018-07-01| 28.1| 11|
| 1|2018-08-01| 28.1| 12|
| 2|2018-12-01| 6.6| 1|
| 2|2018-01-01| 7.6| 2|
| 2|2018-02-01| 8.1| 3|
2
station_id
SELECT * FROM
(
SELECT station_id, year_month, t_monthly_min,
ROW_NUMBER() OVER
(PARTITION BY station_id ORDER BY t_monthly_min) AS rank
FROM temperatures_monthly
WHERE year_month BETWEEN '2018-01-01' AND '2018-12-31'
) AS q
WHERE rank < 3
ORDER BY station_id, rank;
|station_id|year_month|t_monthly_min|rank|
|----------|----------|-------------|----|
| 1|2018-12-01| 5.4| 1|
| 1|2018-02-01| 6.2| 2|
| 2|2018-12-01| 6.6| 1|
| 2|2018-01-01| 7.6| 2|
| 3|2018-01-01| 1.6| 1|
| 3|2018-02-01| 1.6| 2|
DESC
:ROW_NUMBER() OVER (PARTITION BY station_id ORDER BY t_monthly_min DESC)
Time Series Analysis in PostgreSQL