More complex SQL queries

Streamlined Data Ingestion with pandas

Amany Mahfouz

Instructor

Getting DISTINCT Values

Get unique values for one or more columns with SELECT DISTINCT

Syntax:

SELECT DISTINCT [column names] FROM [table];

Remove duplicate records:
```
SELECT DISTINCT * FROM [table];
```

/* Get unique street addresses and boroughs */
SELECT DISTINCT incident_address, 
       borough
  FROM hpd311calls;

Aggregate Functions

Query a database directly for descriptive statistics
Aggregate functions
- SUM
- AVG
- MAX
- MIN
- COUNT

Aggregate Functions

SUM, AVG, MAX, MIN
- Each takes a single column name
```
SELECT AVG(tmax) FROM weather;
```

COUNT

Get number of rows that meet query conditions
```
SELECT COUNT(*) FROM [table_name];
```

Get number of unique values in a column

SELECT COUNT(DISTINCT [column_names]) FROM [table_name];

GROUP BY

Aggregate functions calculate a single summary statistic by default
Summarize data by categories with GROUP BY statements
Remember to also select the column you're grouping by!

/* Get counts of plumbing calls by borough */
SELECT borough, 
       COUNT(*) 
  FROM hpd311calls
 WHERE complaint_type = 'PLUMBING'
 GROUP BY borough;

Counting by Groups

# Create database engine
engine = create_engine("sqlite:///data.db")


# Write query to get plumbing call counts by borough
query = """SELECT borough, COUNT(*) 
             FROM hpd311calls
            WHERE complaint_type = 'PLUMBING'
            GROUP BY borough;"""


# Query databse and create dataframe
plumbing_call_counts = pd.read_sql(query, engine)

Counting by Groups

print(plumbing_call_counts)

         borough  COUNT(*)
0          BRONX      2016
1       BROOKLYN      2702
2      MANHATTAN      1413
3         QUEENS       808
4  STATEN ISLAND       178

Let's practice!

Streamlined Data Ingestion with pandas