Zwei-Stichproben-Test für Anteile

Hypothesentests in Python

James Chapman

Curriculum Manager, DataCamp

Zwei Anteile vergleichen

$H_{0}$: Anteil Hobbyist:innen ist bei Unter-30 und Ab-30 gleich

$H_{0}$: $p_{\geq30} - p_{<30} = 0$

$H_{A}$: Anteil Hobbyist:innen unterscheidet sich zwischen Unter-30 und Ab-30

$H_{A}$: $p_{\geq30} - p_{<30} \neq 0$

alpha = 0.05

Den z‑Wert berechnen

z‑Formel für einen Anteilstest:

$$ z = \frac{(\hat{p}_{\geq30} - \hat{p}_{<30}) - 0}{\text{SE}(\hat{p}_{\geq30} - \hat{p}_{<30})} $$

Standardfehler: $$ \text{SE}(\hat{p}_{\geq30} - \hat{p}_{<30}) = \sqrt{\dfrac{\hat{p} \times (1 - \hat{p})}{n_{\geq30}} + \dfrac{\hat{p} \times (1 - \hat{p})}{n_{<30}}} $$

$\hat{p}$ → gewichtetes Mittel von $\hat{p}_{\geq30}$ und $\hat{p}_{<30}$

$$ \hat{p} = \frac{n_{\geq30} \times \hat{p}_{\geq30} + n_{<30} \times \hat{p}_{<30}}{n_{\geq30} + n_{<30} } $$

Für den z‑Wert brauchst du nur $\hat{p}_{\geq30}$, $\hat{p}_{<30}$, $n_{\geq30}$, $n_{<30}$ aus der Stichprobe

Zahlen für den z‑Wert ermitteln

p_hats = stack_overflow.groupby("age_cat")['hobbyist'].value_counts(normalize=True)

age_cat      hobbyist
At least 30  Yes         0.773333
             No          0.226667
Under 30     Yes         0.843105
             No          0.156895
Name: hobbyist, dtype: float64

n = stack_overflow.groupby("age_cat")['hobbyist'].count()

age_cat
At least 30    1050
Under 30       1211
Name: hobbyist, dtype: int64

Zahlen für den z‑Wert ermitteln

p_hats = stack_overflow.groupby("age_cat")['hobbyist'].value_counts(normalize=True)

age_cat      hobbyist
At least 30  Yes         0.773333
             No          0.226667
Under 30     Yes         0.843105
             No          0.156895
Name: hobbyist, dtype: float64

p_hat_at_least_30 = p_hats[("At least 30", "Yes")]
p_hat_under_30 = p_hats[("Under 30", "Yes")]
print(p_hat_at_least_30, p_hat_under_30)

0.773333 0.843105

Zahlen für den z‑Wert ermitteln

n = stack_overflow.groupby("age_cat")['hobbyist'].count()

age_cat
At least 30    1050
Under 30       1211
Name: hobbyist, dtype: int64

n_at_least_30 = n["At least 30"]
n_under_30 = n["Under 30"]
print(n_at_least_30, n_under_30)

1050 1211

Zahlen für den z‑Wert ermitteln

p_hat = (n_at_least_30 * p_hat_at_least_30 + n_under_30 * p_hat_under_30) / 
        (n_at_least_30 + n_under_30)

std_error = np.sqrt(p_hat * (1-p_hat) / n_at_least_30 + 
                    p_hat * (1-p_hat) / n_under_30)

z_score = (p_hat_at_least_30 - p_hat_under_30) / std_error

print(z_score)

-4.223718652693034

Anteilstests mit proportions_ztest()

stack_overflow.groupby("age_cat")['hobbyist'].value_counts()

age_cat      hobbyist
At least 30  Yes          812
             No           238
Under 30     Yes         1021
             No           190
Name: hobbyist, dtype: int64

n_hobbyists = np.array([812, 1021])

n_rows = np.array([812 + 238, 1021 + 190])

from statsmodels.stats.proportion import proportions_ztest
z_score, p_value = proportions_ztest(count=n_hobbyists, nobs=n_rows,

                                     alternative="two-sided")

(-4.223691463320559, 2.403330142685068e-05)

Lass uns üben!

Hypothesentests in Python