Analyzing US Census Data in Python
Lee Hachadoorian
Asst. Professor of Instruction, Temple University
[B|C]ssnnn[A-I]
B
or C
= "Base Table" or "Collapsed Table"
B15002 | C15002[A-I] |
---|---|
No schooling | Less than high school diploma |
Nursery to 4th grade | High school grad, GED, or alt. |
5th and 6th grade | Some college or associate's |
7th and 8th grade | Bachelor's degree or higher |
9th grade | |
etc... | |
A
= White aloneB
= Black or African American AloneC
= American Indian and Alaska Native AloneD
= Asian AloneE
= Native Hawaiian and Other Pacific Islander AloneF
= Some Other Race AloneG
= Two or More RacesH
= White Alone, Not Hispanic or LatinoI
= Hispanic or LatinoSource: https://www.census.gov/programs-surveys/acs/guidance/which-data-tool/table-ids-explained.html
Wide DataFrame: msa_labor_force
msa male_lf female_lf
0 12060 400843 481425
1 25540 30656 35046
2 26420 231346 268923
3 26900 55943 71036
...
msa_labor_force.columns =
["msa", "male", "female"]
Tidy DataFrame: tidy_msa_labor_force
msa sex labor_force
0 12060 male 400843
1 25540 male 30656
2 26420 male 231346
3 26900 male 55943
...
49 12060 female 481425
50 25540 female 35046
51 26420 female 268923
52 26900 female 71036
...
tidy_msa_labor_force = msa_labor_force.melt(
id_vars = ["msa"],
value_vars = ["male", "female"],
var_name = "sex",
value_name = "labor_force" )
tidy_msa_labor_force
msa sex labor_force
0 12060 male 400843
1 25540 male 30656
2 26420 male 231346
3 26900 male 55943
...
49 12060 female 481425
50 25540 female 35046
51 26420 female 268923
52 26900 female 71036
...
Analyzing US Census Data in Python