What's in the database?

Exploratory Data Analysis in SQL

Christina Maimone

Data Scientist

PostgreSQL

PostgreSQL Logo

Exploratory Data Analysis in SQL

Database client

Database client connects to database server

Exploratory Data Analysis in SQL

entity relationship diagram with each database table and their columns

Exploratory Data Analysis in SQL

Evanston 311 database table

Exploratory Data Analysis in SQL

erdiagram_fortune500.png

Exploratory Data Analysis in SQL

Stackoverflow database table

Exploratory Data Analysis in SQL

Tables supporting stackoverflow database table

Exploratory Data Analysis in SQL

Select a few rows

SELECT * 
  FROM company 
LIMIT 5;
 id | exchange | ticker |         name          | parent_id 
<hr />-+----------+--------+-----------------------+-----------
  1 | nasdaq   | PYPL   | PayPal Holdings, Inc. |          
  2 | nasdaq   | AMZN   | Amazon.com, Inc.      |          
  3 | nasdaq   | MSFT   | Microsoft Corporation |          
  4 | nasdaq   | MDB    | MongoDB Inc.          |          
  5 | nasdaq   | DBX    | Dropbox, Inc.         |          
(5 rows)
Exploratory Data Analysis in SQL

A few reminders

Code Note
NULL missing
Exploratory Data Analysis in SQL

A few reminders

Code Note
NULL missing
IS NULL, IS NOT NULL don't use = NULL
Exploratory Data Analysis in SQL

A few reminders

Code Note
NULL missing
IS NULL, IS NOT NULL don't use = NULL
count(*) number of rows
Exploratory Data Analysis in SQL

A few reminders

Code Note
NULL missing
IS NULL, IS NOT NULL don't use = NULL
count(*) number of rows
count(column_name) number of non-NULL values
Exploratory Data Analysis in SQL

A few reminders

Code Note
NULL missing
IS NULL, IS NOT NULL don't use = NULL
count(*) number of rows
count(column_name) number of non-NULL values
count(DISTINCT column_name) number of different non-NULL values
Exploratory Data Analysis in SQL

A few reminders

Code Note
NULL missing
IS NULL, IS NOT NULL don't use = NULL
count(*) number of rows
count(column_name) number of non-NULL values
count(DISTINCT column_name) number of different non-NULL values
SELECT DISTINCT column_name ... distinct values, including NULL
Exploratory Data Analysis in SQL

Let's start exploring

Exploratory Data Analysis in SQL

Preparing Video For Download...