Refining imports with SQL queries

Streamlined Data Ingestion with pandas

Amany Mahfouz

Instructor

SELECTing Columns

  • SELECT [column names] FROM [table name];
  • Example:
    SELECT date, tavg 
    FROM weather;
    
Streamlined Data Ingestion with pandas

WHERE Clauses

  • Use a WHERE clause to selectively import records
    SELECT [column_names] 
      FROM [table_name] 
     WHERE [condition];
    
Streamlined Data Ingestion with pandas

Filtering by Numbers

  • Compare numbers with mathematical operators
    • =
    • > and >=
    • < and <=
    • <> (not equal to)
    • Example:
      SELECT * 
      FROM weather 
      WHERE tmax > 32;
      
Streamlined Data Ingestion with pandas

Filtering Text

  • Match exact strings with the = sign and the text to match
  • String matching is case-sensitive
  • Example:
    /* Get records about incidents in Brooklyn */
    SELECT * 
    FROM hpd311calls
    WHERE borough = 'BROOKLYN';
    
Streamlined Data Ingestion with pandas

SQL and pandas

# Load libraries
import pandas as pd
from sqlalchemy import create_engine

# Create database engine engine = create_engine("sqlite:///data.db")
# Write query to get records from Brooklyn query = """SELECT * FROM hpd311calls WHERE borough = 'BROOKLYN';"""
# Query the database brooklyn_calls = pd.read_sql(query, engine)
print(brookyn_calls.borough.unique())
['BROOKLYN']
Streamlined Data Ingestion with pandas

Combining Conditions: AND

  • WHERE clauses with AND return records that meet all conditions
# Write query to get records about plumbing in the Bronx
and_query = """SELECT * 
                 FROM hpd311calls 
                WHERE borough = 'BRONX' 
                  AND complaint_type = 'PLUMBING';"""

# Get calls about plumbing issues in the Bronx bx_plumbing_calls = pd.read_sql(and_query, engine) # Check record count print(bx_plumbing_calls.shape)
(2016, 8)
Streamlined Data Ingestion with pandas

Combining Conditions: OR

  • WHERE clauses with OR return records that meet at least one condition
# Write query to get records about water leaks or plumbing
or_query = """SELECT * 
                FROM hpd311calls 
               WHERE complaint_type = 'WATER LEAK'
                  OR complaint_type = 'PLUMBING';"""

# Get calls that are about plumbing or water leaks leaks_or_plumbing = pd.read_sql(or_query, engine) # Check record count print(leaks_or_plumbing.shape)
(10684, 8)
Streamlined Data Ingestion with pandas

Let's practice!

Streamlined Data Ingestion with pandas

Preparing Video For Download...