Using the Census API

Analyzing US Census Data in Python

Lee Hachadoorian

Asst. Professor of Instruction, Temple University

Structure of a Census API Request

https://api.census.gov/data/2010/dec/sf1?get=NAME,P001001,&for=state:*

Structure of a Census API Request

https://api.census.gov/data/2010/dec/sf1?

Base URL
- Host = https://api.census.gov/data
- Year = 2010
- Dataset = dec/sf1

Structure of a Census API Request

https://api.census.gov/data/2010/dec/sf1?get=NAME,P001001,&for=state:*

Base URL
- Host = https://api.census.gov/data
- Year = 2010
- Dataset = dec/sf1
Parameters
- get - List of variables
- for - Geography of interest

The requests Library

import requests 

HOST = "https://api.census.gov/data"
year = "2010"
dataset = "dec/sf1"

base_url = "/".join([HOST, year, dataset])

predicates = {}

get_vars = ["NAME", "AREALAND", "P001001"]

predicates["get"] = ",".join(get_vars)

predicates["for"] = "state:*"

r = requests.get(base_url, params=predicates)

Examine the Response

print(r.text)

[["NAME","AREALAND","P001001","state"],
["Alabama","131170787086","4779736","01"],
["Alaska","1477953211577","710231","02"],
["Arizona","294207314414","6392017","04"],
...

Response Errors

print(r.text)

error: unknown variable 'nonexistentvariable'

Create User-Friendly Column Names

print(r.json()[0])

['NAME', 'AREALAND', 'P001001', 'state']

Create easy to remember column names using snake_case:

col_names = ["name", "area_m2", "total_pop", "state"]

Load into Pandas DataFrame

import pandas as pd 

df = pd.DataFrame(columns=col_names, data=r.json()[1:]) 

# Fix data types
df["area_m2"] = df["area_m2"].astype(int)
df["total_pop"] = df["total_pop"].astype(int)

print(df.head())

         name        area_m2  total_pop state
0     Alabama   131170787086    4779736    01
1      Alaska  1477953211577     710231    02
2     Arizona   294207314414    6392017    04
3    Arkansas   134771261408    2915918    05
4  California   403466310059   37253956    06

Find 3 Most Densely Settled States

# Create new column
df["pop_per_km2"] = 1000**2 * df["total_pop"] / df["area_m2"]

# Find top 3
df.nlargest(3, "pop_per_km2")

                    name      area_m2  total_pop state  pop_per_km2
8   District of Columbia    158114680     601723    11  3805.611218
30            New Jersey  19047341691    8791894    34   461.581156
51           Puerto Rico   8867536532    3725789    72   420.160547

Let's practice!

Analyzing US Census Data in Python