Time Series Analysis in PostgreSQL
Jasmin Ludolf
Content Developer, DataCamp
SUM() OVER|values|running_total| |------|-------------|| 1| 1|| 2| 3|| 3| 6|
SELECT
    id,
    timeseries,
    views,
    SUM(views) OVER(
      PARTITION BY id 
      ORDER BY timeseries)
FROM views;
| id|         timeseries|views|sum|
|---|-------------------|-----|---|
|121|2015-12-27 17:34:16|   23| 23|
|121|2015-12-27 17:54:16|   10| 33|
|121|2015-12-27 18:14:16|    8| 41|
|121|2015-12-27 18:34:16|    4| 45|
|211|2015-12-28 03:37:18|    4|  4|
|211|2015-12-28 03:57:18|    2|  6|
|211|2015-12-28 04:17:18|    2|  8|
|211|2015-12-28 04:37:18|    1|  9|
|223|2016-01-01 05:41:22|    7|  7|
|223|2016-01-01 06:01:22|    5| 12|
...
SELECT * FROM train_schedule;
|train_id|      station|arrival_time|
|--------|-------------|------------|
|     324|San Francisco|    07:59:00|
|     324|22nd Street  |    08:03:00|
|     324|Millbrae     |    08:16:00|
|     324|Hillsdale    |    08:24:00|
|     324|Redwood City |    08:31:00|
|     324|Palo Alto    |    08:37:00|
|     324|San Jose     |    09:05:00|
|     217|Gilroy       |    06:06:00|
|     217|San Martin   |    06:15:00|
...
LEAD(): look at values aheadLEAD(value, offset)SELECT train_id, station, arrival_time, LEAD(arrival_time, 1) OVER (PARTITION BY train_id ORDER BY arrival_time) AS next_arrival_time FROM train_schedule ORDER BY train_id, arrival_time;
|train_id|     station|arrival_time|next_arrival_time|
|-------|-------------|------------|-----------------|
|    217|       Gilroy|    06:06:00|         06:15:00|
|    217|   San Martin|    06:15:00|         06:21:00|
|    217|  Morgan Hill|    06:21:00|         06:36:00|
|    217| Blossom Hill|    06:36:00|         06:42:00|
|    217|      Capitol|    06:42:00|         06:50:00|
|    217|       Tamien|    06:50:00|         06:59:00|
|    217|     San Jose|    06:59:00|                 |
|    324|San Francisco|    07:59:00|         08:03:00|
|    324|  22nd Street|    08:03:00|         08:16:00|
|    324|     Millbrae|    08:16:00|         08:24:00|
...
SELECT train_id, station, arrival_time, LEAD(arrival_time, 1) OVER ( PARTITION BY train_id ORDER BY arrival_time) AS next_arrival_time,LEAD(arrival_time, 1) OVER ( PARTITION BY train_id ORDER BY arrival_time) - arrival_time AS durationFROM train_schedule ORDER BY train_id, arrival_time;
|train_id|      station|arrival_time|next_arrival_time|duration|
|--------|-------------|------------|-----------------|--------|
|     217|       Gilroy|    06:06:00|         06:15:00|00:09:00|
|     217|   San Martin|    06:15:00|         06:21:00|00:06:00|
|     217|  Morgan Hill|    06:21:00|         06:36:00|00:15:00|
|     217| Blossom Hill|    06:36:00|         06:42:00|00:06:00|
|     217|      Capitol|    06:42:00|         06:50:00|00:08:00|
|     217|       Tamien|    06:50:00|         06:59:00|00:09:00|
|     217|     San Jose|    06:59:00|                 |        |
|     324|San Francisco|    07:59:00|         
...
SUM() OVER on duration to calculate the running totalLAG(): look at values behindLAG(value, offset)SELECT
    id,
    year_month,
    m_avg,
    LAG(t_monthly_avg, 1) OVER (
        PARTITION BY station_id
        ORDER BY year_month)
        AS previous_m_avg
FROM temperatures_monthly
ORDER BY station_id, year_month;
|id|year_month|m_avg|previous_m_avg|
|--|----------|-----|--------------|
| 1|2010-01-01| 13.6|              |
| 1|2010-02-01| 14.8|          13.6|
| 1|2010-03-01| 17.0|          14.8|
| 1|2010-04-01| 20.0|          17.0|
| 1|2010-05-01| 24.8|          20.0|
| 1|2010-06-01| 31.3|          24.8|
...
Time Series Analysis in PostgreSQL