Using the Census API

Analyzing US Census Data in Python

Lee Hachadoorian

Asst. Professor of Instruction, Temple University

Structure of a Census API Request

https://api.census.gov/data/2010/dec/sf1?get=NAME,P001001,&for=state:*

Analyzing US Census Data in Python

Structure of a Census API Request

https://api.census.gov/data/2010/dec/sf1?

  • Base URL
    • Host = https://api.census.gov/data
    • Year = 2010
    • Dataset = dec/sf1
Analyzing US Census Data in Python

Structure of a Census API Request

https://api.census.gov/data/2010/dec/sf1?get=NAME,P001001,&for=state:*

  • Base URL
    • Host = https://api.census.gov/data
    • Year = 2010
    • Dataset = dec/sf1
  • Parameters
    • get - List of variables
    • for - Geography of interest
Analyzing US Census Data in Python

The requests Library

import requests 

HOST = "https://api.census.gov/data" year = "2010" dataset = "dec/sf1"
base_url = "/".join([HOST, year, dataset])
predicates = {}
get_vars = ["NAME", "AREALAND", "P001001"]
predicates["get"] = ",".join(get_vars)
predicates["for"] = "state:*"
r = requests.get(base_url, params=predicates)
Analyzing US Census Data in Python

Examine the Response

print(r.text)
[["NAME","AREALAND","P001001","state"],
["Alabama","131170787086","4779736","01"],
["Alaska","1477953211577","710231","02"],
["Arizona","294207314414","6392017","04"],
...
Analyzing US Census Data in Python

Response Errors

print(r.text)
error: unknown variable 'nonexistentvariable'
Analyzing US Census Data in Python

Create User-Friendly Column Names

print(r.json()[0])
['NAME', 'AREALAND', 'P001001', 'state']

Create easy to remember column names using snake_case:

col_names = ["name", "area_m2", "total_pop", "state"]
Analyzing US Census Data in Python

Load into Pandas DataFrame

import pandas as pd 

df = pd.DataFrame(columns=col_names, data=r.json()[1:])
# Fix data types df["area_m2"] = df["area_m2"].astype(int) df["total_pop"] = df["total_pop"].astype(int)
print(df.head())
         name        area_m2  total_pop state
0     Alabama   131170787086    4779736    01
1      Alaska  1477953211577     710231    02
2     Arizona   294207314414    6392017    04
3    Arkansas   134771261408    2915918    05
4  California   403466310059   37253956    06
Analyzing US Census Data in Python

Find 3 Most Densely Settled States

# Create new column
df["pop_per_km2"] = 1000**2 * df["total_pop"] / df["area_m2"]

# Find top 3 df.nlargest(3, "pop_per_km2")
                    name      area_m2  total_pop state  pop_per_km2
8   District of Columbia    158114680     601723    11  3805.611218
30            New Jersey  19047341691    8791894    34   461.581156
51           Puerto Rico   8867536532    3725789    72   420.160547
Analyzing US Census Data in Python

Let's practice!

Analyzing US Census Data in Python

Preparing Video For Download...