Analyzing US Census Data in Python
Lee Hachadoorian
Asst. Professor of Instruction, Temple University
name | B25045_001E | B25045_001M |
---|---|---|
Alabama | 1,844,546 | ±11,416 |
Alaska | 257,330 | ±3,380 |
Arizona | 2,356,055 | ±12,130 |
Arkansas | 1,127,621 | ±7,837 |
B25045.head()
NAME B25045_001E B25045_001M state
0 Alabama 1844546 11416 01
1 Alaska 257330 3380 02
2 Arizona 2356055 12130 04
3 Arkansas 1127621 7837 05
4 California 12468743 22250 06
B25045.columns = ["name", "total", "total_moe", "state"]
B25045.head()
name total total_moe state
0 Alabama 1844546 11416 01
1 Alaska 257330 3380 02
2 Arizona 2356055 12130 04
3 Arkansas 1127621 7837 05
4 California 12468743 22250 06
Margin of Error as a Percent of the Estimate:
$$RMOE = 100 \times MOE / Estimate$$
NAME B25045_001E B25045_001M state rmoe
0 California 13005097 17539 06 0.134863
1 Wyoming 225796 3968 56 1.757338
NAME B25045_001E B25045_001M state county rmoe
0 Los Angeles County 3311231 8549 06 037 0.258182
1 Sutter County, Cal 31945 907 06 101 2.839255
B25045_004E — Owner Occupied?No Vehicle Available?Householder 15 to 34 Years
NAME B25045_004E B25045_004M state rmoe
0 California 10964 1519 06 13.854433
1 Wyoming 25 48 56 192.000000
NAME B25045_004E B25045_004M state county rmoe
0 Los Angeles Cou 1942 634 06 037 32.646756
1 Sutter County, 0 210 06 101 inf
$$Z_{90} = 1.645$$
$$SE_{x} = \frac{MOE_x}{Z_{90}}$$
$$Z = \frac{x_1 - x_2}{\sqrt{SE_{x_1}^2 + SE_{x_2}^2}}$$
total total_moe year
4 12944178 15703 2016
4 13005097 17539 2017
Z_CRIT = 1.645 x1 = int(ca["total"][ca["year"] == 2017]) x2 = int(ca["total"][ca["year"] == 2016])
se_x1 = float(ca["total_moe"][ca["year"] == 2017] / Z_CRIT) se_x2 = float(ca["total_moe"][ca["year"] == 2016] / Z_CRIT)
$$Z = \frac{x_1 - x_2}{\sqrt{SE_{x_1}^2 + SE_{x_2}^2}}$$
total total_moe year
4 12944178 15703 2016
4 13005097 17539 2017
Z = (x1 - x2) / __________(___________________)
$$Z = \frac{x_1 - x_2}{\sqrt{SE_{x_1}^2 + SE_{x_2}^2}}$$
total total_moe year
4 12944178 15703 2016
4 13005097 17539 2017
Z = (x1 - x2) / numpy.sqrt(___________________)
$$Z = \frac{x_1 - x_2}{\sqrt{SE_{x_1}^2 + SE_{x_2}^2}}$$
total total_moe year
4 12944178 15703 2016
4 13005097 17539 2017
Z = (x1 - x2) / numpy.sqrt(se_x1**2 + se_x2**2)
print(abs(Z) > Z_CRIT)
True
$$SE_{a+b+...} = \sqrt{SE_a^2 + SE_b^2 +...}$$
$$MOE_{a+b+...} = Z_{90}SE_{a+b+...}$$
states["novehicle_65over"] = \ states["owned_novehicle_65over"] + states["rented_novehicle_65over"]
states["novehicle_65over_moe"] = Z_CRIT * numpy.sqrt(\ states["owned_novehicle_65over_moe"]**2 + \ states["rented_novehicle_65over_moe"]**2\ )
print(states[["name", "novehicle_65over", "novehicle_65over_moe"]].head())
name novehicle_65over novehicle_65over_moe
0 Alabama 42267 4867.038791
1 Alaska 5575 1473.170747
2 Arizona 52331 6598.753623
3 Arkansas 22533 3155.583824
4 California 372772 15183.882878
Analyzing US Census Data in Python