Time Series Analysis in PostgreSQL
Jasmin Ludolf
Content Developer, DataCamp
SELECT station, arrival_time
FROM train_schedule
WHERE train_id = 324
|station |arrival_time|
|-------------|------------|
|San Francisco| 07:59:00|
|22nd Street | 08:03:00|
|Millbrae | 08:16:00|
|Hillsdale | 08:24:00|
|Redwood City | 08:31:00|
|Palo Alto | 08:37:00|
...
SELECT year_month, t_monthly_min, t_monthly_max, t_monthly_avg
FROM temperature_stations AS ts
JOIN temperatures_monthly AS tm USING(station_id)
WHERE station_id = 1
|year_month|t_monthly_min|t_monthly_max|t_monthly_avg|
|----------|-------------|-------------|-------------|
|2010-01-01| 6.7| 20.3| 13.6|
|2010-02-01| 8.3| 20.8| 14.8|
|2010-03-01| 9.1| 23.9| 17.0|
|2010-04-01| 11.3| 27.1| 20.0|
|2010-05-01| 15.0| 32.1| 24.8|
...
SELECT ts, views
FROM dc_news_fact
WHERE id = '12561'
ORDER BY ts;
|ts |views|
|-------------------|-----|
|2015-12-29 11:40:23| 4161|
|2015-12-29 12:00:23| 407|
|2015-12-29 12:20:23| 0|
|2015-12-29 12:40:23| 778|
|2015-12-29 13:00:23| 396|
...
DATE_TRUNC()
: alters granularity of time series data
DATE_TRUNC(field, source, time zone)
SELECT DATE_TRUNC('hour', ts) AS hour,
SUM(VIEWS) AS views
FROM dc_news_fact WHERE id = '12561' GROUP BY hour ORDER BY hour;
|hour |views|
|-------------------|-----|
|2015-12-29 09:00:00| 0|
|2015-12-29 10:00:00| 0|
|2015-12-29 11:00:00| 4161|
|2015-12-29 12:00:00| 1185|
|2015-12-29 13:00:00| 1146|
|2015-12-29 14:00:00| 697|
|2015-12-29 15:00:00| 1013|
|2015-12-29 16:00:00| 956|
|2015-12-29 17:00:00| 1307|
|2015-12-29 18:00:00| 700|
...
DATE_PART()
: extracts specific data
DATE_PART(field, source)
EXTRACT()
also extracts specific dataDATE_PART()
DATE_PART()
returns a result in double precision which is impreciseSELECT DATE_PART('hour', ts) AS hour_of_day, SUM(VIEWS) AS views
FROM dc_news_fact WHERE id = '12561' GROUP BY hour_of_day ORDER BY hour_of_day;
|hour_of_day|views|
|-----------|-----|
| 0| 500|
| 1| 500|
| 2| 400|
| 3| 400|
| 4| 1200|
| 5| 100|
| 6| 100|
| 7| 2000|
| 8| 200|
| 9| 2500|
...
SELECT
EXTRACT(dow FROM ts)
AS day_of_week,
SUM(VIEWS) AS views
FROM dc_news_fact
WHERE id = '12561'
GROUP BY day_of_week
ORDER BY day_of_week;
|day_of_week|views|
|-----------|-----|
| 2|15265|
| 3|13100|
| 4| 1200|
Time Series Analysis in PostgreSQL