Partitioning and window functions

Time Series Analysis in PostgreSQL

Jasmin Ludolf

Content Developer, DataCamp

Partitioning

  • Split a large table into smaller subsets, or partitions
  • Useful when dealing with large datasets
  • Time series data:
    • new data everyday
    • partition by range of dates

Illustration of a larger table being split into two subsets

Time Series Analysis in PostgreSQL

A range

  • Range boundaries:

    • inclusive at lower end
    • exclusive at upper end
  • Alternative to range: partition by event or category

Range 1 : 2000-2010

2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009

Range 2 : 2010-2020

2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020

Time Series Analysis in PostgreSQL

Partition by

  • PARTITION BY RANGE
CREATE TABLE timetable (
    date_info DATE,
    train_id INTEGER,
    departure_time TIMESTAMP,
    arrival_time TIMESTAMP,
    delay INTEGER
) PARTITION BY RANGE (date_info);
Time Series Analysis in PostgreSQL

Partition of

  • PARITION OF FOR VALUES FROM
  • timetable is a partitioned table, partitioned on date_info using the RANGE method

 

CREATE TABLE timetable_y2020 PARTITION OF timetable
    FOR VALUES FROM ('2020-01-01') to ('2020-12-31');
Time Series Analysis in PostgreSQL

Window functions

  • Simplify regular SQL queries
  • Returns a value for each row in a table (can depend on other rows)

 

  • Window : set of rows on which the function operates
  • Window function : the function used on those rows, or window
  • Aggregate functions are comparative, but the result is not grouped into one value
Time Series Analysis in PostgreSQL

Over clause

  • The OVER clause indicates a window function
  • Example: regional_timetable date, times, id, delays, and region
  • If no PARTITION BY : the window function treats the entire table as one partition
SELECT region, train_id, delay, AVG(delay) OVER (PARTITION BY region)
FROM regional_timetable;
|region|train_id|   delay|     avg|
|------|--------|--------|--------|
| north|       6|00:00:05|00:00:05|
| north|       2|00:00:03|00:00:05|
| north|       5|00:00:07|00:00:05|
|  east|       1|00:00:10|00:00:21|
|  east|       3|00:00:32|00:00:21|
| south|       8|00:01:00|00:01:00|
Time Series Analysis in PostgreSQL

Let's practice!

Time Series Analysis in PostgreSQL

Preparing Video For Download...