Intermediate SQL Server
Ginger Grant
Instructor
Chapter 1: Summarizing data
Chapter 2: Date and math functions
Chapter 3: Processing data with T-SQL
Chapter 4: Window functions
Reviewing summarized values for each column is a common first step in analyzing data
If the data exists in a database, fastest way to aggregate is to use SQL
SELECT Country, Year, InternetUse, GDP,
ExportGoodsPercent, CellPhonesper100
FROM EconomicIndicators
+-----------+-----+------------+------------+------------------+-----------------+
|Country |Year |InternetUse | GDP |ExportGoodsPercent|CellPhonesper100 |
+-----------+-----+------------+------------+------------------+-----------------+
|Swaziland |2011 |20.43165813 |7335004354 |56.30476059 |63.7015615 |
|Sweden |2011 |90.88204559 |394271163688|49.93022195 |118.5711258 |
|Switzerland|2011 |82.98773087 |395111518596|51.20242546 |130.0623629 |
...
+-----------+-----+------------+------------+------------------+-----------------+
MIN()
for the minimum value of a columnMAX()
for the maximum value of a columnAVG()
for the mean or average value of a columnThis T-SQL query returns the aggregated values of column InternetUse
SELECT AVG(InternetUse) AS MeanInternetUse,
MIN(InternetUse) AS MINInternet,
MAX(InternetUse) AS MAXInternet
FROM EconomicIndicators
+-------------------+------------+-------------+
|MeanInternetUse |MINInternet | MAXInternet|
|-------------------+------------+-------------|
| 18.9854496196171| 0 | 375.5970064|
+-------------------+------------+-------------+
This T-SQL query filters the aggregated values using a WHERE clause
Notice the text value is in
SELECT AVG(InternetUse) AS MeanInternetUse,
MIN(InternetUse) AS MINInternet,
MAX(InternetUse) AS MAXInternet
FROM EconomicIndicators
WHERE Country = 'Solomon Islands'
+-------------------+------------+-------------+
|MeanInternetUse |MINInternet | MAXInternet|
|-------------------+------------+-------------|
| 1.79621| 0 | 6.00|
+-------------------+------------+-------------+
SELECT Country, AVG(InternetUse) AS MeanInternetUse,
MIN(InternetUse) AS MINInternet,
MAX(InternetUse) AS MAXInternet
FROM EconomicIndicators
GROUP BY Country
+--------------------+------------------+------------+-------------+
| Country |MeanInternetUse |MINInternet | MAXInternet|
|--------------------+------------------+------------+-------------|
|Solomon Islands | 1.79621| 0| 6.00|
|Hong Kong | 245.1067| 0| 375.00|
|Liechtenstein | 63.8821| 36.5152| 85.00|
...
+--------------------+------------------+------------+-------------+
Cannot use WHERE
with GROUP BY
as it will give you an error
-- This throws an error
...
GROUP BY
WHERE Max(InternetUse) > 100
Instead, use HAVING
-- This is how you filter with a GROUP BY
...
GROUP BY
HAVING Max(InternetUse) > 100
SELECT Country, AVG(InternetUse) AS MeanInternetUse,
MIN(GDP) AS SmallestGDP,
MAX(InternetUse) AS MAXInternetUse
FROM EconomicIndicators
GROUP BY Country
HAVING MAX(InternetUse) > 100
+--------------+-------------------+-------------+---------------+
|Country |MeanInternetUse |SmallestGDP | MAXInternetUse|
|--------------+-------------------+-------------+---------------|
|Macedonia | 71.3060150792857| -0.465059948| 110.5679538|
|Hong Kong | 245.106718614286| 0| 375.5970064|
|Congo | 60.8972476010714| -9.492757847| 104.6455529|
...
+--------------+-------------------+-------------+---------------+
Intermediate SQL Server