Report duplication

Reporting in SQL

Tyler Pernes

Learning & Development Consultant

What causes duplication?

Reporting in SQL

What causes duplication?

SELECT p.id, SUM(points) AS points, SUM(matches_win) AS matches_won
FROM points AS p
JOIN matches AS m ON p.id = m.id
GROUP BY p.id;
Reporting in SQL

What causes duplication?

+-----+--------+--------------+
| id  | points | matches_won  |
|-----|--------|--------------|
| 1   | 156    | 10           |
+-----+--------+--------------+
1 will result in a points value three times what it should be, in this case, 156.
Reporting in SQL

What causes duplication?

Intermediate Table
+-----+------+--------------+---------+ 
| id  | year | matches_won  | points  |
|-----|------|--------------|---------|    
| 1   | 2016 | 5            | 52      |
| 1   | 2017 | 2            | 52      |  
| 1   | 2018 | 3            | 52      |
+-----+------+--------------+---------+
Reporting in SQL

What causes duplication?

Intermediate Table
+-----+------+--------------+---------+ 
| id  | year | matches_won  | points  |
|-----|------|--------------|---------|    
| 1   | 2016 | 5            | 52      | <--
| 1   | 2017 | 2            | 52      | <-- SUM(points) = 52 x 3 = 156
| 1   | 2018 | 3            | 52      | <-- 
+-----+------+--------------+---------+    
Reporting in SQL

Ways to fix duplication

1. Remove aggregations

SELECT p.id, points, SUM(matches_won) AS matches_won
FROM points AS p
JOIN matches AS m ON p.id = m.id
GROUP BY p.id, points;
+-----+--------+--------------+
| id  | points | matches_won  |
|-----|--------|--------------|
| 1   | 52     | 10           |
+-----+--------+--------------+
Reporting in SQL

Ways to fix duplication

Reporting in SQL

Ways to fix duplication

2. Add field to JOIN statement

SELECT p.id, SUM(points) AS points, SUM(matches_win) AS matches_won
FROM points AS p
JOIN matches AS m ON p.id = m.id AND p.year = m.year
GROUP BY p.id;
Reporting in SQL

Ways to fix duplication

2. Add field to JOIN statement

SELECT p.id, SUM(points) AS points, SUM(matches_win) AS matches_won
FROM points AS p
JOIN matches AS m ON p.id = m.id AND p.year = m.year
GROUP BY p.id;
Reporting in SQL

Ways to fix duplication

SELECT id, SUM(matches_won)
FROM matches
GROUP BY id;
+-----+--------------+
| id  | matches_won  |
|-----|--------------|
| 1   | 10           |
| 2   | 7            |
+-----+--------------+
Reporting in SQL

Ways to fix duplication

3. Rollup using subquery

SELECT p.id, points, matches_won
FROM points AS p
JOIN 
    (SELECT id, SUM(matches_won) AS matches_won
    FROM matches
    GROUP BY id) AS m 
ON p.id = m.id;
Reporting in SQL

Ways to fix duplication

  1. Remove aggregations
  2. Add field to JOIN statement
  3. Rollup using subquery
Reporting in SQL

Identifying duplication

Value in original table:

SELECT SUM(points) AS total_points
FROM points;
total_points = 52

Value in query:

SELECT SUM(points) AS total_points
FROM   
    (SELECT p.id, SUM(points) AS points
    FROM points AS p
    JOIN matches AS m ON p.id = m.id
    GROUP BY p.id) AS subquery;
total_points = 156
Reporting in SQL

Chapter goal

Reporting in SQL

Practice time!

Reporting in SQL

Preparing Video For Download...