Correlated subqueries

Data Manipulation in SQL

Mona Khalil

Data Scientist, Greenhouse Software

Correlated subquery

  • Uses values from the outer query to generate a result
  • Re-run for every row generated in the final data set
  • Used for advanced joining, filtering, and evaluating data
Data Manipulation in SQL

A simple example

  • Which match stages tend to have a higher than average number of goals scored?
SELECT 
    s.stage,
    ROUND(s.avg_goals,2) AS avg_goal,
    (SELECT AVG(home_goal + away_goal) FROM match 
     WHERE season = '2012/2013') AS overall_avg 
FROM 
    (SELECT
         stage,
         AVG(home_goal + away_goal) AS avg_goals
     FROM match
     WHERE season = '2012/2013'
     GROUP BY stage) AS s
WHERE s.avg_goals > (SELECT AVG(home_goal + away_goal) 
                     FROM match 
                     WHERE season = '2012/2013');
Data Manipulation in SQL

A simple example

  • Which match stages tend to have a higher than average number of goals scored?
SELECT 
    s.stage,
    ROUND(s.avg_goals,2) AS avg_goal,
    (SELECT AVG(home_goal + away_goal) 
     FROM match 
     WHERE season = '2012/2013') AS overall_avg 
FROM (SELECT
        stage,
        AVG(home_goal + away_goal) AS avg_goals
      FROM match
      WHERE season = '2012/2013'
      GROUP BY stage) AS s -- Subquery in FROM
WHERE s.avg_goals > (SELECT AVG(home_goal + away_goal) 
                     FROM match 
                     WHERE season = '2012/2013'); -- Subquery in WHERE
Data Manipulation in SQL

A correlated example

SELECT
    s.stage,
    ROUND(s.avg_goals,2) AS avg_goal,
    (SELECT AVG(home_goal + away_goal) 
     FROM match 
     WHERE season = '2012/2013') AS overall_avg 
FROM 
    (SELECT
         stage,
         AVG(home_goal + away_goal) AS avg_goals
     FROM match
     WHERE season = '2012/2013'
     GROUP BY stage) AS s
WHERE s.avg_goals > (SELECT AVG(home_goal + away_goal) 
                     FROM match AS m 
                     WHERE s.stage > m.stage);
Data Manipulation in SQL

A correlated example

| stage | avg_goals |
|-------|-----------|
| 3     | 2.83      |
| 4     | 2.8       |
| 6     | 2.78      |
| 8     | 3.09      |
| 10    | 2.96      |
Data Manipulation in SQL

Simple vs. correlated subqueries

Simple Subquery

  • Can be run independently from the main query
  • Evaluated once in the whole query

Correlated Subquery

  • Dependent on the main query to execute
  • Evaluated in loops
    • Significantly slows down query runtime
Data Manipulation in SQL

Correlated subqueries

  • What is the average number of goals scored in each country?
SELECT
  c.name AS country,
  AVG(m.home_goal + m.away_goal) 
     AS avg_goals
FROM country AS c
LEFT JOIN match AS m
ON c.id = m.country_id
GROUP BY country;
| country     | avg_goals        |
|-------------|------------------|
| Belgium     | 2.89344262295082 |
| England     | 2.76776315789474 |
| France      | 2.51052631578947 |
| Germany     | 2.94607843137255 |
| Italy       | 2.63150867823765 |
| Netherlands | 3.14624183006536 |
| Poland      | 2.49375          |
| Portugal    | 2.63255360623782 |
| Scotland    | 2.74122807017544 |
| Spain       | 2.78223684210526 |
| Switzerland | 2.81054131054131 |
Data Manipulation in SQL

Correlated subqueries

  • What is the average number of goals scored in each country?
SELECT
  c.name AS country,
  (SELECT 
     AVG(home_goal + away_goal)
   FROM match AS m
   WHERE m.country_id = c.id) 
     AS avg_goals
FROM country AS c
GROUP BY country;
| country     | avg_goals        |
|-------------|------------------|
| Belgium     | 2.89344262295082 |
| England     | 2.76776315789474 |
| France      | 2.51052631578947 |
| Germany     | 2.94607843137255 |
| Italy       | 2.63150867823765 |
| Netherlands | 3.14624183006536 |
| Poland      | 2.49375          |
| Portugal    | 2.63255360623782 |
| Scotland    | 2.74122807017544 |
| Spain       | 2.78223684210526 |
| Switzerland | 2.81054131054131 |
Data Manipulation in SQL

Let's practice!

Data Manipulation in SQL

Preparing Video For Download...