Running total

Time Series Analysis in PostgreSQL

Jasmin Ludolf

Content Developer, DataCamp

What is a running total?

  • Running total:
    • Total of a sequence of values
    • Cumulative sum
    • Add a new value to the total of all previous values
  • Calculate the running total with SUM() OVER
|values|running_total|
|------|-------------|

| 1| 1|
| 2| 3|
| 3| 6|
Time Series Analysis in PostgreSQL

Calculate the running total

SELECT
    id,
    timeseries,
    views,
    SUM(views) OVER(
      PARTITION BY id 
      ORDER BY timeseries)
FROM views;
| id|         timeseries|views|sum|
|---|-------------------|-----|---|
|121|2015-12-27 17:34:16|   23| 23|
|121|2015-12-27 17:54:16|   10| 33|
|121|2015-12-27 18:14:16|    8| 41|
|121|2015-12-27 18:34:16|    4| 45|
|211|2015-12-28 03:37:18|    4|  4|
|211|2015-12-28 03:57:18|    2|  6|
|211|2015-12-28 04:17:18|    2|  8|
|211|2015-12-28 04:37:18|    1|  9|
|223|2016-01-01 05:41:22|    7|  7|
|223|2016-01-01 06:01:22|    5| 12|
...
Time Series Analysis in PostgreSQL

The dataset: a train schedule

SELECT * FROM train_schedule;
|train_id|      station|arrival_time|
|--------|-------------|------------|
|     324|San Francisco|    07:59:00|
|     324|22nd Street  |    08:03:00|
|     324|Millbrae     |    08:16:00|
|     324|Hillsdale    |    08:24:00|
|     324|Redwood City |    08:31:00|
|     324|Palo Alto    |    08:37:00|
|     324|San Jose     |    09:05:00|
|     217|Gilroy       |    06:06:00|
|     217|San Martin   |    06:15:00|
...
1 https://www.caltrain.com/schedules/weekdaytimetable.html
Time Series Analysis in PostgreSQL

Lead function

  • LEAD(): look at values ahead
  • LEAD(value, offset)
Time Series Analysis in PostgreSQL

Using the lead function

SELECT
    train_id,
    station,
    arrival_time,
    LEAD(arrival_time, 1) OVER (

PARTITION BY train_id ORDER BY arrival_time) AS next_arrival_time FROM train_schedule ORDER BY train_id, arrival_time;
Time Series Analysis in PostgreSQL

Using the lead function

|train_id|     station|arrival_time|next_arrival_time|
|-------|-------------|------------|-----------------|
|    217|       Gilroy|    06:06:00|         06:15:00|
|    217|   San Martin|    06:15:00|         06:21:00|
|    217|  Morgan Hill|    06:21:00|         06:36:00|
|    217| Blossom Hill|    06:36:00|         06:42:00|
|    217|      Capitol|    06:42:00|         06:50:00|
|    217|       Tamien|    06:50:00|         06:59:00|
|    217|     San Jose|    06:59:00|                 |
|    324|San Francisco|    07:59:00|         08:03:00|
|    324|  22nd Street|    08:03:00|         08:16:00|
|    324|     Millbrae|    08:16:00|         08:24:00|
...
Time Series Analysis in PostgreSQL

Calculating duration of each stop

SELECT
    train_id,
    station,
    arrival_time,
    LEAD(arrival_time, 1) OVER (
        PARTITION BY train_id
        ORDER BY arrival_time) AS next_arrival_time,

LEAD(arrival_time, 1) OVER ( PARTITION BY train_id ORDER BY arrival_time) - arrival_time AS duration
FROM train_schedule ORDER BY train_id, arrival_time;
Time Series Analysis in PostgreSQL

Calculating duration of each stop

|train_id|      station|arrival_time|next_arrival_time|duration|
|--------|-------------|------------|-----------------|--------|
|     217|       Gilroy|    06:06:00|         06:15:00|00:09:00|
|     217|   San Martin|    06:15:00|         06:21:00|00:06:00|
|     217|  Morgan Hill|    06:21:00|         06:36:00|00:15:00|
|     217| Blossom Hill|    06:36:00|         06:42:00|00:06:00|
|     217|      Capitol|    06:42:00|         06:50:00|00:08:00|
|     217|       Tamien|    06:50:00|         06:59:00|00:09:00|
|     217|     San Jose|    06:59:00|                 |        |
|     324|San Francisco|    07:59:00|         
...
  • Use SUM() OVER on duration to calculate the running total
Time Series Analysis in PostgreSQL

Lag function

  • LAG(): look at values behind
  • LAG(value, offset)
SELECT
    id,
    year_month,
    m_avg,
    LAG(t_monthly_avg, 1) OVER (
        PARTITION BY station_id
        ORDER BY year_month)
        AS previous_m_avg
FROM temperatures_monthly
ORDER BY station_id, year_month;
|id|year_month|m_avg|previous_m_avg|
|--|----------|-----|--------------|
| 1|2010-01-01| 13.6|              |
| 1|2010-02-01| 14.8|          13.6|
| 1|2010-03-01| 17.0|          14.8|
| 1|2010-04-01| 20.0|          17.0|
| 1|2010-05-01| 24.8|          20.0|
| 1|2010-06-01| 31.3|          24.8|
...
Time Series Analysis in PostgreSQL

Let's practice!

Time Series Analysis in PostgreSQL

Preparing Video For Download...