Exploratory Data Analysis in SQL
Christina Maimone
Data Scientist
Name | Storage Size | Description | Range |
---|---|---|---|
integer or int or int4 |
4 bytes | typical choice | -2147483648 to +2147483647 |
Name | Storage Size | Description | Range |
---|---|---|---|
integer or int or int4 |
4 bytes | typical choice | -2147483648 to +2147483647 |
smallint or int2 |
2 bytes | small-range | -32768 to +32767 |
bigint or int8 |
8 bytes | large-range | -9223372036854775808 to +9223372036854775807 |
Name | Storage Size | Description | Range |
---|---|---|---|
integer or int or int4 |
4 bytes | typical choice | -2147483648 to +2147483647 |
smallint or int2 |
2 bytes | small-range | -32768 to +32767 |
bigint or int8 |
8 bytes | large-range | -9223372036854775808 to +9223372036854775807 |
serial |
4 bytes | auto-increment | 1 to 2147483647 |
smallserial |
2 bytes | small auto-increment | 1 to 32767 |
bigserial |
8 bytes | large auto-increment | 1 to 9223372036854775807 |
Name | Storage Size | Description | Range |
---|---|---|---|
decimal or numeric |
variable | user-specified precision, exact | up to 131072 digits before the decimal point; up to 16383 digits after the decimal point |
Name | Storage Size | Description | Range |
---|---|---|---|
decimal or numeric |
variable | user-specified precision, exact | up to 131072 digits before the decimal point; up to 16383 digits after the decimal point |
real |
4 bytes | variable-precision, inexact | 6 decimal digits precision |
double precision |
8 bytes | variable-precision, inexact | 15 decimal digits precision |
-- integer division
SELECT 10/4;
2
-- numeric division
SELECT 10/4.0;
2.500000000
SELECT min(question_pct)
FROM stackoverflow;
min
-----
0
(1 row)
SELECT max(question_pct)
FROM stackoverflow;
max
-------------
0.071957428
(1 row)
SELECT avg(question_pct)
FROM stackoverflow;
avg
---------------------
0.00379494620059319
(1 row)
Population Variance
SELECT var_pop(question_pct)
FROM stackoverflow;
var_pop
----------------------
0.000140268640974167
(1 row)
Sample Variance
SELECT var_samp(question_pct)
FROM stackoverflow;
var_samp
----------------------
0.000140271571051059
(1 row)
SELECT variance(question_pct)
FROM stackoverflow;
variance
----------------------
0.000140271571051059
(1 row)
Sample Standard Deviation
SELECT stddev_samp(question_pct)
FROM stackoverflow;
stddev_samp
--------------------
0.0118436299778007
(1 row)
SELECT stddev(question_pct)
FROM stackoverflow;
stddev
--------------------
0.0118436299778007
(1 row)
Population Standard Deviation
SELECT stddev_pop(question_pct)
FROM stackoverflow;
stddev_pop
--------------------
0.0118435062787237
(1 row)
SELECT round(42.1256, 2);
42.13
-- Summarize by group with GROUP BY
SELECT tag,
min(question_pct),
avg(question_pct),
max(question_pct)
FROM stackoverflow
GROUP BY tag;
tag | min | avg | max
--------------------------+-------------+----------------------+-------------
amazon-sqs | 6.91e-05 | 8.08328877005347e-05 | 9.6e-05
amazon-kinesis | 2.1e-05 | 3.3924064171123e-05 | 4.64e-05
android-pay | 2.97e-05 | 3.16712477396022e-05 | 3.29e-05
amazon-cloudformation | 4.8e-05 | 9.34518997326204e-05 | 0.00015246
citrix | 3.6e-05 | 3.95804407713499e-05 | 4.39e-05
amazon-ec2 | 0.001058039 | 0.00122817236730946 | 0.001378872
actionscript | 0.000551486 | 0.00067589990909091 | 0.000856132
amazon-ecs | 1.17e-05 | 3.40544117647059e-05 | 6.51e-05
mongodb | 0.0049625 | 0.00577465885069125 | 0.00631164
amazon-redshift | 0.000117294 | 0.000160832181818182 | 0.000212208
...
Exploratory Data Analysis in SQL