Aggregating with different data granularities

Improving Query Performance in PostgreSQL

Amy McCarty

Instructor

Data granularity - level of detail

 

  • Makes a row unique
  • 1+ columns

 

Video_Games

id game first_yr
012 Grand Theft Auto 1997
234 Legend of Zelda 1986

 

 

 

Game_Platforms

game_id platform year
234 FCDS 1986
234 GameCube 2003
234 Wii 2006
Improving Query Performance in PostgreSQL

Joining different granularities

The Legend of Zelda record from the Video_Games table links to all 3 records in the Games_Platforms table.

SELECT g.id, g.game, g.first_yr
  , COUNT(p.platform) AS no_platforms
  , MAX(p.year) AS last_platform_yr
  , p.platform
FROM video_games g
INNER JOIN game_platforms p
  ON g.id = p.game_id
GROUP BY g.id, g.game, g.first_yr
, p.platform
Improving Query Performance in PostgreSQL

Joining different granularities

 

Output

id game first_yr no_platforms last_platform_yr platform
234 Legend of Zelda 1986 3 2006 FCDS
234 Legend of Zelda 1986 3 2006 GameCube
234 Legend of Zelda 1986 3 2006 Wii
Improving Query Performance in PostgreSQL

Setting up a granularity change

 

Game_Platforms

game_id platform year
234 FCDS 1986
234 GameCube 2003
234 Wii 2006

 

Improving Query Performance in PostgreSQL

Setting up a granularity change

SELECT game_id, platform, year
  , COUNT(platform) AS no_platforms
  , MAX(year) AS last_platform_yr
FROM game_platforms
GROUP BY game_id, platform, year

Output

game_id platform year no_platforms last_platform_yr
234 FCDS 1986 3 2006
234 GameCube 2003 3 2006
234 Wii 2006 3 2006
Improving Query Performance in PostgreSQL

CTEs revisited

  • Standalone query with temporary results set
  • WITH statement
WITH cte AS
  ( SELECT * FROM tableA )

SELECT * FROM cte
  • Allows aggregation before joining
Improving Query Performance in PostgreSQL

CTEs to the rescue

WITH platforms_cte AS 
(  SELECT game_id, platform, year
      , COUNT(platform) AS no_platforms
      , MAX(year) AS last_platform_yr
   FROM game_platforms
   GROUP BY game_id, platform, year )
SELECT g.id, g.game, cte.no_platforms
  , cte.platform AS last_platform
FROM video_games g
INNER JOIN platforms_cte cte 
  ON g.id = cte.game_id
  AND cte.last_platform_yr = cte.year
Improving Query Performance in PostgreSQL

CTEs to the rescue

WITH platforms_cte AS 
(  SELECT game_id, platform, year
      , COUNT(platform) AS no_platforms
      , MAX(year) AS last_platform_yr
   FROM game_platforms
   GROUP BY game_id, platform, year )
SELECT g.id, g.game, cte.no_platforms
  , cte.platform AS last_platform
FROM video_games g
INNER JOIN platforms_cte cte 
  ON g.id = cte.game_id
  AND cte.last_platform_yr = cte.year

Output

id game no_platforms last_platform
234 Legend of Zelda 3 Wii
Improving Query Performance in PostgreSQL

Matching data granularity when joining

 

  • No repeats or duplicates
  • Minimum needed results
  • No double counting
Improving Query Performance in PostgreSQL

Let's practice!

Improving Query Performance in PostgreSQL

Preparing Video For Download...