Margins of Error

Analyzing US Census Data in Python

Lee Hachadoorian

Asst. Professor of Instruction, Temple University

Margins of Error

  • Table B25045 - Tenure by Vehicles Available by Age of Householder
    • B25045_001E - Estimate of total occupied housing units
    • B25045_001M - Margin of Error of the estimate
name B25045_001E B25045_001M
Alabama 1,844,546 ±11,416
Alaska 257,330 ±3,380
Arizona 2,356,055 ±12,130
Arkansas 1,127,621 ±7,837
Analyzing US Census Data in Python

Margins of Error


B25045.head()
         NAME  B25045_001E  B25045_001M state
0     Alabama      1844546        11416    01
1      Alaska       257330         3380    02
2     Arizona      2356055        12130    04
3    Arkansas      1127621         7837    05
4  California     12468743        22250    06
Analyzing US Census Data in Python

Margins of Error

B25045.columns = ["name", "total", "total_moe", "state"]
B25045.head()
         name        total    total_moe state
0     Alabama      1844546        11416    01
1      Alaska       257330         3380    02
2     Arizona      2356055        12130    04
3    Arkansas      1127621         7837    05
4  California     12468743        22250    06
Analyzing US Census Data in Python

Relative Margin of Error

Margin of Error as a Percent of the Estimate:

$$RMOE = 100 \times MOE / Estimate$$

         NAME  B25045_001E  B25045_001M state      rmoe
0  California     13005097        17539    06  0.134863
1     Wyoming       225796         3968    56  1.757338
                 NAME  B25045_001E  B25045_001M state county      rmoe
0  Los Angeles County      3311231         8549    06    037  0.258182
1  Sutter County, Cal        31945          907    06    101  2.839255
Analyzing US Census Data in Python

Margins of Error of Breakdown Columns

B25045_004E — Owner Occupied?No Vehicle Available?Householder 15 to 34 Years

         NAME  B25045_004E  B25045_004M state        rmoe
0  California        10964         1519    06   13.854433
1     Wyoming           25           48    56  192.000000
              NAME  B25045_004E  B25045_004M state county       rmoe
0  Los Angeles Cou         1942          634    06    037  32.646756
1  Sutter County,             0          210    06    101        inf
Analyzing US Census Data in Python

Standard Errors

$$Z_{90} = 1.645$$

$$SE_{x} = \frac{MOE_x}{Z_{90}}$$

Analyzing US Census Data in Python

Statistically Significant Difference

$$Z = \frac{x_1 - x_2}{\sqrt{SE_{x_1}^2 + SE_{x_2}^2}}$$

      total  total_moe  year
4  12944178      15703  2016
4  13005097      17539  2017
Z_CRIT = 1.645
x1 = int(ca["total"][ca["year"] == 2017])
x2 = int(ca["total"][ca["year"] == 2016])

se_x1 = float(ca["total_moe"][ca["year"] == 2017] / Z_CRIT) se_x2 = float(ca["total_moe"][ca["year"] == 2016] / Z_CRIT)
Analyzing US Census Data in Python

Statistically Significant Difference

$$Z = \frac{x_1 - x_2}{\sqrt{SE_{x_1}^2 + SE_{x_2}^2}}$$

      total  total_moe  year
4  12944178      15703  2016
4  13005097      17539  2017
Z = (x1 - x2) / __________(___________________)


Analyzing US Census Data in Python

Statistically Significant Difference

$$Z = \frac{x_1 - x_2}{\sqrt{SE_{x_1}^2 + SE_{x_2}^2}}$$

      total  total_moe  year
4  12944178      15703  2016
4  13005097      17539  2017
Z = (x1 - x2) / numpy.sqrt(___________________)


Analyzing US Census Data in Python

Statistically Significant Difference

$$Z = \frac{x_1 - x_2}{\sqrt{SE_{x_1}^2 + SE_{x_2}^2}}$$

      total  total_moe  year
4  12944178      15703  2016
4  13005097      17539  2017
Z = (x1 - x2) / numpy.sqrt(se_x1**2 + se_x2**2)

print(abs(Z) > Z_CRIT)
True
Analyzing US Census Data in Python

Approximating SE for Derived Estimates

$$SE_{a+b+...} = \sqrt{SE_a^2 + SE_b^2 +...}$$

$$MOE_{a+b+...} = Z_{90}SE_{a+b+...}$$

states["novehicle_65over"] = \
  states["owned_novehicle_65over"] + states["rented_novehicle_65over"]

states["novehicle_65over_moe"] = Z_CRIT * numpy.sqrt(\ states["owned_novehicle_65over_moe"]**2 + \ states["rented_novehicle_65over_moe"]**2\ )
Analyzing US Census Data in Python

Approximating SE for Derived Estimates

print(states[["name", "novehicle_65over", "novehicle_65over_moe"]].head())

         name  novehicle_65over  novehicle_65over_moe
0     Alabama             42267           4867.038791
1      Alaska              5575           1473.170747
2     Arizona             52331           6598.753623
3    Arkansas             22533           3155.583824
4  California            372772          15183.882878
Analyzing US Census Data in Python

Let's Practice!

Analyzing US Census Data in Python

Preparing Video For Download...