Using the Census API

US Census-gegevens analyseren in Python

Lee Hachadoorian

Asst. Professor of Instruction, Temple University

Structure of a Census API Request

https://api.census.gov/data/2010/dec/sf1?get=NAME,P001001,&for=state:*

US Census-gegevens analyseren in Python

Structure of a Census API Request

https://api.census.gov/data/2010/dec/sf1?

  • Base URL
    • Host = https://api.census.gov/data
    • Year = 2010
    • Dataset = dec/sf1
US Census-gegevens analyseren in Python

Structure of a Census API Request

https://api.census.gov/data/2010/dec/sf1?get=NAME,P001001,&for=state:*

  • Base URL
    • Host = https://api.census.gov/data
    • Year = 2010
    • Dataset = dec/sf1
  • Parameters
    • get - List of variables
    • for - Geography of interest
US Census-gegevens analyseren in Python

The requests Library

import requests 

HOST = "https://api.census.gov/data" year = "2010" dataset = "dec/sf1"
base_url = "/".join([HOST, year, dataset])
predicates = {}
get_vars = ["NAME", "AREALAND", "P001001"]
predicates["get"] = ",".join(get_vars)
predicates["for"] = "state:*"
r = requests.get(base_url, params=predicates)
US Census-gegevens analyseren in Python

Examine the Response

print(r.text)
[["NAME","AREALAND","P001001","state"],
["Alabama","131170787086","4779736","01"],
["Alaska","1477953211577","710231","02"],
["Arizona","294207314414","6392017","04"],
...
US Census-gegevens analyseren in Python

Response Errors

print(r.text)
error: unknown variable 'nonexistentvariable'
US Census-gegevens analyseren in Python

Create User-Friendly Column Names

print(r.json()[0])
['NAME', 'AREALAND', 'P001001', 'state']

Create easy to remember column names using snake_case:

col_names = ["name", "area_m2", "total_pop", "state"]
US Census-gegevens analyseren in Python

Load into Pandas DataFrame

import pandas as pd 

df = pd.DataFrame(columns=col_names, data=r.json()[1:])
# Fix data types df["area_m2"] = df["area_m2"].astype(int) df["total_pop"] = df["total_pop"].astype(int)
print(df.head())
         name        area_m2  total_pop state
0     Alabama   131170787086    4779736    01
1      Alaska  1477953211577     710231    02
2     Arizona   294207314414    6392017    04
3    Arkansas   134771261408    2915918    05
4  California   403466310059   37253956    06
US Census-gegevens analyseren in Python

Find 3 Most Densely Settled States

# Create new column
df["pop_per_km2"] = 1000**2 * df["total_pop"] / df["area_m2"]

# Find top 3 df.nlargest(3, "pop_per_km2")
                    name      area_m2  total_pop state  pop_per_km2
8   District of Columbia    158114680     601723    11  3805.611218
30            New Jersey  19047341691    8791894    34   461.581156
51           Puerto Rico   8867536532    3725789    72   420.160547
US Census-gegevens analyseren in Python

Let's practice!

US Census-gegevens analyseren in Python

Preparing Video For Download...