Time Series Analysis in PostgreSQL
Jasmin Ludolf
Content Developer, DataCamp
SUM() OVER
|values|running_total| |------|-------------|
| 1| 1|
| 2| 3|
| 3| 6|
SELECT
id,
timeseries,
views,
SUM(views) OVER(
PARTITION BY id
ORDER BY timeseries)
FROM views;
| id| timeseries|views|sum|
|---|-------------------|-----|---|
|121|2015-12-27 17:34:16| 23| 23|
|121|2015-12-27 17:54:16| 10| 33|
|121|2015-12-27 18:14:16| 8| 41|
|121|2015-12-27 18:34:16| 4| 45|
|211|2015-12-28 03:37:18| 4| 4|
|211|2015-12-28 03:57:18| 2| 6|
|211|2015-12-28 04:17:18| 2| 8|
|211|2015-12-28 04:37:18| 1| 9|
|223|2016-01-01 05:41:22| 7| 7|
|223|2016-01-01 06:01:22| 5| 12|
...
SELECT * FROM train_schedule;
|train_id| station|arrival_time|
|--------|-------------|------------|
| 324|San Francisco| 07:59:00|
| 324|22nd Street | 08:03:00|
| 324|Millbrae | 08:16:00|
| 324|Hillsdale | 08:24:00|
| 324|Redwood City | 08:31:00|
| 324|Palo Alto | 08:37:00|
| 324|San Jose | 09:05:00|
| 217|Gilroy | 06:06:00|
| 217|San Martin | 06:15:00|
...
LEAD()
: look at values aheadLEAD(value, offset)
SELECT train_id, station, arrival_time, LEAD(arrival_time, 1) OVER (
PARTITION BY train_id ORDER BY arrival_time) AS next_arrival_time FROM train_schedule ORDER BY train_id, arrival_time;
|train_id| station|arrival_time|next_arrival_time|
|-------|-------------|------------|-----------------|
| 217| Gilroy| 06:06:00| 06:15:00|
| 217| San Martin| 06:15:00| 06:21:00|
| 217| Morgan Hill| 06:21:00| 06:36:00|
| 217| Blossom Hill| 06:36:00| 06:42:00|
| 217| Capitol| 06:42:00| 06:50:00|
| 217| Tamien| 06:50:00| 06:59:00|
| 217| San Jose| 06:59:00| |
| 324|San Francisco| 07:59:00| 08:03:00|
| 324| 22nd Street| 08:03:00| 08:16:00|
| 324| Millbrae| 08:16:00| 08:24:00|
...
SELECT train_id, station, arrival_time, LEAD(arrival_time, 1) OVER ( PARTITION BY train_id ORDER BY arrival_time) AS next_arrival_time,
LEAD(arrival_time, 1) OVER ( PARTITION BY train_id ORDER BY arrival_time) - arrival_time AS duration
FROM train_schedule ORDER BY train_id, arrival_time;
|train_id| station|arrival_time|next_arrival_time|duration|
|--------|-------------|------------|-----------------|--------|
| 217| Gilroy| 06:06:00| 06:15:00|00:09:00|
| 217| San Martin| 06:15:00| 06:21:00|00:06:00|
| 217| Morgan Hill| 06:21:00| 06:36:00|00:15:00|
| 217| Blossom Hill| 06:36:00| 06:42:00|00:06:00|
| 217| Capitol| 06:42:00| 06:50:00|00:08:00|
| 217| Tamien| 06:50:00| 06:59:00|00:09:00|
| 217| San Jose| 06:59:00| | |
| 324|San Francisco| 07:59:00|
...
SUM() OVER
on duration
to calculate the running totalLAG()
: look at values behindLAG(value, offset)
SELECT
id,
year_month,
m_avg,
LAG(t_monthly_avg, 1) OVER (
PARTITION BY station_id
ORDER BY year_month)
AS previous_m_avg
FROM temperatures_monthly
ORDER BY station_id, year_month;
|id|year_month|m_avg|previous_m_avg|
|--|----------|-----|--------------|
| 1|2010-01-01| 13.6| |
| 1|2010-02-01| 14.8| 13.6|
| 1|2010-03-01| 17.0| 14.8|
| 1|2010-04-01| 20.0| 17.0|
| 1|2010-05-01| 24.8| 20.0|
| 1|2010-06-01| 31.3| 24.8|
...
Time Series Analysis in PostgreSQL