Pengujian Hipotesis di R
Richie Cotton
Data Evangelist at DataCamp
$p$: proporsi populasi (parameter populasi tidak diketahui)
$\hat{p}$: proporsi sampel (statistik sampel)
$p_{0}$: proporsi populasi yang dihipotesiskan
$$ z = \frac{\hat{p} - \text{mean}(\hat{p})}{\text{standard error}(\hat{p})} = \frac{\hat{p} - p}{\text{standard error}(\hat{p})} $$
Dengan asumsi $H_{0}$ benar, $p = p_{0}$, sehingga
$$ z = \dfrac{\hat{p} - p_{0}}{\text{standard error}(\hat{p})} $$
$SE(\bar{x}_{\text{child}} - \bar{x}_{\text{adult}}) \approx \sqrt{\dfrac{s_{\text{child}}^2}{n_{\text{child}}} + \dfrac{s_{\text{adult}}^2}{n_{\text{adult}}}}$
$SE_{\hat{p}} = \sqrt{\dfrac{p_{0}*(1-p_{0})}{n}}$
Dengan asumsi $H_{0}$ benar,
$z = \dfrac{\hat{p} - p_{0}}{\sqrt{\dfrac{p_{0}*(1-p_{0})}{n}}}$
Ini hanya memakai informasi sampel ($\hat{p}$ dan $n$) serta parameter hipotesis ($p_{0}$).
$t = \dfrac{(\bar{x}_{\text{child}} - \bar{x}_{\text{adult}})}{\sqrt{\dfrac{s_{\text{child}}^2}{n_{\text{child}}} + \dfrac{s_{\text{adult}}^2}{n_{\text{adult}}}}}$
$H_{0}$: Proporsi pengguna SO di bawah 30 sama dengan 0,5.
$H_{A}$: Proporsi pengguna SO di bawah 30 tidak sama dengan 0,5.
alpha <- 0.01
stack_overflow %>%
count(age_cat)
# A tibble: 2 x 2
age_cat n
<chr> <int>
1 At least 30 1050
2 Under 30 1216
p_hat <- stack_overflow %>%
summarize(prop_under_30 = mean(age_cat == "Under 30")) %>%
pull(prop_under_30)
0.5366
p_0 <- 0.50
n <- nrow(stack_overflow)
2266
$z = \dfrac{\hat{p} - p_{0}}{\sqrt{\dfrac{p_{0}*(1-p_{0})}{n}}}$
numerator <- p_hat - p_0
denominator <- sqrt(p_0 * (1 - p_0) / n)
z_score <- numerator / denominator
3.487
Uji kiri ("kurang dari")
p_value <- pnorm(z_score)
Uji kanan ("lebih dari")
p_value <- pnorm(z_score, lower.tail = FALSE)
Uji dua sisi ("tidak sama dengan")
p_value <- pnorm(z_score) +
pnorm(z_score, lower.tail = FALSE)
p_value <- 2 * pnorm(z_score)
0.000244
p_value <= alpha
TRUE
Pengujian Hipotesis di R