Query lifecycle and the planner

Improving Query Performance in PostgreSQL

Amy McCarty

Instructor

Basic query lifecycle

System Front end steps Back end processes
1 Parser Send query to database Checks syntax. Translates SQL into more computer friendly syntax based on system stored rules.
2 Planner & Optimizer Assess and optimize query tasks Uses database stats to create query plan. Calculates costs and chooses the best plan.
3 Executor Return query results Follows the query plan to execute the query.
Improving Query Performance in PostgreSQL

Query planner and optimizer

Responsive to SQL structure changes

  • Generates plan trees
    • Nodes corresponding to steps
    • Visualize with EXPLAIN
  • Estimate cost of each tree
    • Statistics from pg_tables
    • Time based optimization
1 Plan tree: https://www.postgresql.org/docs/current/querytree.html
Improving Query Performance in PostgreSQL

Statistics from pg_tables

SELECT * FROM pg_class
WHERE relname = 'mytable'
-- sample of output columns
| relname | relhasindex |
SELECT * FROM pg_stats
WHERE tablename = 'mytable'
-- sample of output columns
null_frac | avg_width | n_distinct | 
  • Column indexes
  • Count null values
  • Column width
  • Distinct values
Improving Query Performance in PostgreSQL

EXPLAIN

 

  • Window into query plan
  • Steps and cost estimates
    • Does not run query

 

  • Sequential scan of cheeses table
  • Cost and size estimates

 

EXPLAIN
SELECT * FROM cheeses

 

Seq Scan on cheeses 
(cost=0.00..10.50 rows=5725 width=296)
Improving Query Performance in PostgreSQL

EXPLAIN: Scan

 

  • Query plan step
  • Returns rows

 

Seq Scan on cheeses (cost=0.00..10.50 rows=5725 width=296)

 

 

 

 

  • Seq Scan : scan of all the rows in table
Improving Query Performance in PostgreSQL

EXPLAIN: Cost

 

  • Dimensionless
  • Compare structures with same output
    • Should not compare queries with different output

 

Seq Scan on cheeses (cost=0.00..10.50 rows=5725 width=296)

 

 

 

 

 

  • 0.00.. : start up time
  • ..10.50 : total time

  • total time = start up + run time

Improving Query Performance in PostgreSQL

EXPLAIN: Size

 

  • Size estimates

 

Seq Scan on cheeses (cost=0.00..10.50 rows=5725 width=296)

 

 

 

  • rows : rows query needs to examine to run
  • width : byte width of rows
Improving Query Performance in PostgreSQL

EXPLAIN with a WHERE clause

EXPLAIN
SELECT * FROM cheeses WHERE species IN ('goat','sheep') 
Seq Scan on cheeses (cost=0.00..378.90 rows=3 width=118)
 -> Filter: (species = ANY ('{"goat","sheep"}'::text[]))
  • From bottom to top
    • Step 1: Filter
    • Step 2: Sequential scan
  • WHERE clause
    • Decrease rows to scan and increases total cost
Improving Query Performance in PostgreSQL

EXPLAIN with an index

EXPLAIN
SELECT * FROM cheeses WHERE species IN ('goat','sheep') -- index on species column
Bitmap Index Scan using species_idx on cheeses (cost=0.29..12.66 rows=3 width=118)
  Index Cond: (species = ANY ('{"goat","sheep"}'::text[]))
  • Step 1: Bitmap Index Scan
    • Index Cond explains the scan step
  • INDEX
    • Start up cost increased from 0
    • Overall cost decreased from 379
Improving Query Performance in PostgreSQL

Let's practice!

Improving Query Performance in PostgreSQL

Preparing Video For Download...