Manipulating the granularity of time series data

Time Series Analysis in PostgreSQL

Jasmin Ludolf

Content Developer, DataCamp

Time series data

  • Type of data and time data
  • Ordered
  • Collected over a period of time
  • Common examples: stock prices, temperature

Line chart example showing time on the x-axis.

Time Series Analysis in PostgreSQL

Time series data: train times

SELECT station, arrival_time
FROM train_schedule
WHERE train_id = 324
|station      |arrival_time|
|-------------|------------|
|San Francisco|    07:59:00|
|22nd Street  |    08:03:00|
|Millbrae     |    08:16:00|
|Hillsdale    |    08:24:00|
|Redwood City |    08:31:00|
|Palo Alto    |    08:37:00|
...
Time Series Analysis in PostgreSQL

Multivariate time series

SELECT year_month, t_monthly_min, t_monthly_max, t_monthly_avg
FROM temperature_stations AS ts
JOIN temperatures_monthly AS tm USING(station_id)
WHERE station_id = 1
|year_month|t_monthly_min|t_monthly_max|t_monthly_avg|
|----------|-------------|-------------|-------------|
|2010-01-01|          6.7|         20.3|         13.6|
|2010-02-01|          8.3|         20.8|         14.8|
|2010-03-01|          9.1|         23.9|         17.0|
|2010-04-01|         11.3|         27.1|         20.0|
|2010-05-01|         15.0|         32.1|         24.8|
...
Time Series Analysis in PostgreSQL

Time granularity

  • More granularity = more precise measurements
  • Common granularities: seconds, minutes, hours, days, weeks, months, quarters, and years
SELECT ts, views
FROM dc_news_fact
WHERE id = '12561'
ORDER BY ts;
|ts                 |views|
|-------------------|-----|
|2015-12-29 11:40:23| 4161|
|2015-12-29 12:00:23|  407|
|2015-12-29 12:20:23|    0|
|2015-12-29 12:40:23|  778|
|2015-12-29 13:00:23|  396|
...
Time Series Analysis in PostgreSQL

Changing time granularity

  • DATE_TRUNC(): alters granularity of time series data
    • Such as: century, decade, quarter, microseconds and more

 

  • DATE_TRUNC(field, source, time zone)
    • field = granular value (eg. "hour")
    • source = the data
    • time zone = defaults to current setting
Time Series Analysis in PostgreSQL

Changing time granularity

SELECT
    DATE_TRUNC('hour', ts) AS hour,

SUM(VIEWS) AS views
FROM dc_news_fact WHERE id = '12561' GROUP BY hour ORDER BY hour;
|hour               |views|
|-------------------|-----|
|2015-12-29 09:00:00|    0|
|2015-12-29 10:00:00|    0|
|2015-12-29 11:00:00| 4161|
|2015-12-29 12:00:00| 1185|
|2015-12-29 13:00:00| 1146|
|2015-12-29 14:00:00|  697|
|2015-12-29 15:00:00| 1013|
|2015-12-29 16:00:00|  956|
|2015-12-29 17:00:00| 1307|
|2015-12-29 18:00:00|  700|
...
Time Series Analysis in PostgreSQL

Extracting time granularity

  • DATE_PART(): extracts specific data

 

  • DATE_PART(field, source)
    • field = granular value (eg. "hour")
    • source = the data: a timestamp or interval

 

  • EXTRACT() also extracts specific data
    • Recommended instead of DATE_PART()
    • DATE_PART() returns a result in double precision which is imprecise
Time Series Analysis in PostgreSQL

Hour granularity

SELECT
    DATE_PART('hour', ts)
    AS hour_of_day,
    SUM(VIEWS) AS views

FROM dc_news_fact WHERE id = '12561' GROUP BY hour_of_day ORDER BY hour_of_day;
|hour_of_day|views|
|-----------|-----|
|          0|  500|
|          1|  500|
|          2|  400|
|          3|  400|
|          4| 1200|
|          5|  100|
|          6|  100|
|          7| 2000|
|          8|  200|
|          9| 2500|
...
Time Series Analysis in PostgreSQL

Day of week granularity

SELECT
    EXTRACT(dow FROM ts)
    AS day_of_week,
    SUM(VIEWS) AS views
FROM dc_news_fact
WHERE id = '12561'
GROUP BY day_of_week
ORDER BY day_of_week;
  • Day of week: Sunday (0) to Saturday (6)
|day_of_week|views|
|-----------|-----|
|          2|15265|
|          3|13100|
|          4| 1200|
Time Series Analysis in PostgreSQL

Let's practice!

Time Series Analysis in PostgreSQL

Preparing Video For Download...