More complex SQL queries

Streamlined Data Ingestion with pandas

Amany Mahfouz

Instructor

Getting DISTINCT Values

  • Get unique values for one or more columns with SELECT DISTINCT
  • Syntax:
    SELECT DISTINCT [column names] FROM [table];
    
  • Remove duplicate records:
    SELECT DISTINCT * FROM [table];
    
/* Get unique street addresses and boroughs */
SELECT DISTINCT incident_address, 
       borough
  FROM hpd311calls;
Streamlined Data Ingestion with pandas

Aggregate Functions

  • Query a database directly for descriptive statistics
  • Aggregate functions
    • SUM
    • AVG
    • MAX
    • MIN
    • COUNT
Streamlined Data Ingestion with pandas

Aggregate Functions

  • SUM, AVG, MAX, MIN
    • Each takes a single column name
      SELECT AVG(tmax) FROM weather;
      
  • COUNT
    • Get number of rows that meet query conditions
      SELECT COUNT(*) FROM [table_name];
      
    • Get number of unique values in a column
      SELECT COUNT(DISTINCT [column_names]) FROM [table_name];
      
Streamlined Data Ingestion with pandas

GROUP BY

  • Aggregate functions calculate a single summary statistic by default
  • Summarize data by categories with GROUP BY statements
  • Remember to also select the column you're grouping by!
/* Get counts of plumbing calls by borough */
SELECT borough, 
       COUNT(*) 
  FROM hpd311calls
 WHERE complaint_type = 'PLUMBING'
 GROUP BY borough;
Streamlined Data Ingestion with pandas

Counting by Groups

# Create database engine
engine = create_engine("sqlite:///data.db")

# Write query to get plumbing call counts by borough query = """SELECT borough, COUNT(*) FROM hpd311calls WHERE complaint_type = 'PLUMBING' GROUP BY borough;"""
# Query databse and create dataframe plumbing_call_counts = pd.read_sql(query, engine)
Streamlined Data Ingestion with pandas

Counting by Groups

print(plumbing_call_counts)
         borough  COUNT(*)
0          BRONX      2016
1       BROOKLYN      2702
2      MANHATTAN      1413
3         QUEENS       808
4  STATEN ISLAND       178
Streamlined Data Ingestion with pandas

Let's practice!

Streamlined Data Ingestion with pandas

Preparing Video For Download...