Exploratory Data Analysis in SQL
Christina Maimone
Data Scientist
SELECT *
FROM company
LIMIT 5;
id | exchange | ticker | name | parent_id
<hr />-+----------+--------+-----------------------+-----------
1 | nasdaq | PYPL | PayPal Holdings, Inc. |
2 | nasdaq | AMZN | Amazon.com, Inc. |
3 | nasdaq | MSFT | Microsoft Corporation |
4 | nasdaq | MDB | MongoDB Inc. |
5 | nasdaq | DBX | Dropbox, Inc. |
(5 rows)
Code | Note |
---|---|
NULL |
missing |
Code | Note |
---|---|
NULL |
missing |
IS NULL , IS NOT NULL |
don't use = NULL |
Code | Note |
---|---|
NULL |
missing |
IS NULL , IS NOT NULL |
don't use = NULL |
count(*) |
number of rows |
Code | Note |
---|---|
NULL |
missing |
IS NULL , IS NOT NULL |
don't use = NULL |
count(*) |
number of rows |
count(column_name) |
number of non-NULL values |
Code | Note |
---|---|
NULL |
missing |
IS NULL , IS NOT NULL |
don't use = NULL |
count(*) |
number of rows |
count(column_name) |
number of non-NULL values |
count(DISTINCT column_name) |
number of different non-NULL values |
Code | Note |
---|---|
NULL |
missing |
IS NULL , IS NOT NULL |
don't use = NULL |
count(*) |
number of rows |
count(column_name) |
number of non-NULL values |
count(DISTINCT column_name) |
number of different non-NULL values |
SELECT DISTINCT column_name ... |
distinct values, including NULL |
Exploratory Data Analysis in SQL