Capire le prestazioni delle query

Introduzione a Redshift

Jason Myers

Principal Architect

Processo di ottimizzazione delle query

  1. Controlla la tabella STL_ALERT_EVENT_LOG
  2. Esegui EXPLAIN sulla tua query
  3. Controlla le tabelle SVL_QUERY_SUMMARY e SVL_QUERY_REPORT
Introduzione a Redshift

Tabella STL_ALERT_EVENT_LOG

Contiene eventuali avvisi Redshift generati da una query

SELECT * 
  FROM stl_alert_event_log 
 WHERE query = 1337;
 query |             event             |          solution       |     event_time      
=======+===============================+=========================+====================
  1337 | Missing query planner statist | Run the ANALYZE command | 2023-11-13 18:20:58
Introduzione a Redshift

EXPLAIN del piano di esecuzione

  • Mostra tutti i passaggi di una query e i relativi costi e righe lette

  • Funziona su tutte le istruzioni DML (Data Manipulation Language):

    • SELECT
    • SELECT INTO
    • CREATE TABLE AS
    • INSERT
    • UPDATE
    • DELETE
Introduzione a Redshift

Esempio di EXPLAIN

-- Eseguire EXPLAIN sulla query
-- delle prime dieci divisioni per ricavi
EXPLAIN WITH top_ten_divisions_by_rev AS
(
  SELECT division_id,
         SUM(revenue) AS revenue_total
    FROM sales_data
   GROUP BY division_id
   ORDER BY revenue_total DESC
   LIMIT 10
),
division_names AS
(
  SELECT id AS division_id,
         name AS division_name
    FROM division_names
)
SELECT division_name,
       revenue_total
  FROM top_ten_divisions_by_rev
  JOIN division_names USING (DIVISION_ID)
 WHERE revenue_total > 100000;
Introduzione a Redshift

Risultati di EXPLAIN

|QUERY PLAN                                                                                      |
+------------------------------------------------------------------------------------------------+
|Hash Join  (cost=47.11..58.89 rows=3 width=524)                                                 |
|  Hash Cond: (division_names.id = top_ten_divisions_by_rev.division_id)                         |

| -> Seq Scan on division_names (cost=0.00..11.40 rows=140 width=520) | | -> Hash (cost=47.07..47.07 rows=3 width=12) |
| -> Subquery Scan on top_ten_divisions_by_rev (cost=46.92..47.07 rows=3 width=12) | | Filter: (top_ten_divisions_by_rev.revenue_total > 100000) | | -> Limit (cost=46.92..46.95 rows=10 width=12) |
| -> Sort (cost=46.92..47.42 rows=200 width=12) | | Sort Key: (sum(sales_data.revenue)) DESC |
| -> HashAggregate (cost=40.60..42.60 rows=200 width=12) | | Group Key: sales_data.division_id | | -> Seq Scan on sales_data (cost=0.00..30.40 rows=2040 width=8)|
1 https://www.postgresql.org/docs/current/using-explain.html
Introduzione a Redshift

Ayo berlatih!

Introduzione a Redshift

Preparing Video For Download...