R ile Hipotez Testi
Richie Cotton
Data Evangelist at DataCamp
| state | county | repub_percent_08 | repub_percent_12 |
|---|---|---|---|
| Alabama | Bullock | 25.69 | 23.51 |
| Alabama | Chilton | 78.49 | 79.78 |
| Alabama | Clay | 73.09 | 72.31 |
| Alabama | Cullman | 81.85 | 84.16 |
| Alabama | Escambia | 63.89 | 62.46 |
| Alabama | Fayette | 73.93 | 76.19 |
| Alabama | Franklin | 68.83 | 69.68 |
| ... | ... | ... | ... |
500 satır; her satır bir başkanlık seçiminde ilçe düzeyinde oyları temsil eder.
Soru: 2008’de Cumhuriyetçi adaya verilen oy yüzdesi, 2012’ye kıyasla daha düşük müydü?
$H_{0}$: $\mu_{2008} - \mu_{2012} = 0$
$H_{A}$: $\mu_{2008} - \mu_{2012} < 0$
Anlamlılık düzeyi olarak $\alpha = 0.05$ belirleyiniz.
Veriler eşleştirilmiştir; çünkü her oy yüzdesi aynı ilçeye aittir.
sample_data <- repub_votes_potus_08_12 %>%
mutate(diff = repub_percent_08 - repub_percent_12)
ggplot(sample_data, aes(x = diff)) +
geom_histogram(binwidth = 1)

sample_data %>%
summarize(xbar_diff = mean(diff))
xbar_diff
1 -2.643027
Eski hipotezler
$H_{0}$: $\mu_{2008} - \mu_{2012} = 0$
$H_{A}$: $\mu_{2008} - \mu_{2012} < 0$
Yeni hipotezler
$H_{0}$: $\mu_{\text{diff}} = 0$
$H_{A}$: $ \mu_{\text{diff}} < 0$
$t = \dfrac{\bar{x}_{\text{diff}} - \mu_{\text{diff}}}{\sqrt{\dfrac{s_{diff}^2}{n_{\text{diff}}}}}$
$df = n_{diff} - 1$
n_diff <- nrow(sample_data)
s_diff <- sample_data %>%
summarize(sd_diff = sd(diff)) %>%
pull(sd_diff)
t_stat <- (xbar_diff - 0) / sqrt(s_diff ^ 2 / n_diff)
-16.06374
degrees_of_freedom <- n_diff - 1
499
$t = \dfrac{\bar{x}_{\text{diff}} - \mu_{\text{diff}}}{\sqrt{\dfrac{s_{\text{diff}}^2}{n_{\text{diff}}}}}$
$df = n_{\text{diff}} - 1$
p_value <- pt(t_stat, df = degrees_of_freedom)
2.084965e-47
t.test(# Fark vektörü sample_data$diff,# "two.sided", "less", "greater" arasından seçin alternative = "less",# Sıfır hipotezindeki anakütle parametresi mu = 0)
One Sample t-test
data: sample_data$diff
t = -16.064, df = 499, p-value < 2.2e-16
alternative hypothesis: true mean is less than 0
95 percent confidence interval:
-Inf -2.37189
sample estimates:
mean of x
-2.643027
t.test(
sample_data$repub_percent_08,
sample_data$repub_percent_12,
alternative = "less",
mu = 0,
paired = TRUE
)
Paired t-test
data: sample_data$repub_percent_08 and
sample_data$repub_percent_12
t = -16.064, df = 499, p-value < 2.2e-16
alternative hypothesis: true difference in means
is less than 0
95 percent confidence interval:
-Inf -2.37189
sample estimates:
mean of the differences
-2.643027
t.test(
x = sample_data$repub_percent_08,
y = sample_data$repub_percent_12,
alternative = "less",
mu = 0
)
Eşleştirilmemiş t-testi yanlış negatif hata olasılığını artırır (daha düşük istatistiksel güç).
Welch Two Sample t-test
data: sample_data$repub_percent_08 and
sample_data$repub_percent_12
t = -2.8788, df = 992.76, p-value = 0.002039
alternative hypothesis: true difference in means
is less than 0
95 percent confidence interval:
-Inf -1.131469
sample estimates:
mean of x mean of y
56.52034 59.16337
R ile Hipotez Testi