Uji t berpasangan

Pengujian Hipotesis di R

Richie Cotton

Data Evangelist at DataCamp

Dataset presiden Partai Republik AS

state county repub_percent_08 repub_percent_12
Alabama Bullock 25.69 23.51
Alabama Chilton 78.49 79.78
Alabama Clay 73.09 72.31
Alabama Cullman 81.85 84.16
Alabama Escambia 63.89 62.46
Alabama Fayette 73.93 76.19
Alabama Franklin 68.83 69.68
... ... ... ...

500 baris; tiap baris adalah suara tingkat county pada pemilu presiden.

1 https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/VOQCHQ
Pengujian Hipotesis di R

Hipotesis

Pertanyaan: Apakah persentase suara untuk kandidat Republik lebih rendah pada 2008 dibanding 2012?

$H_{0}$: $\mu_{2008} - \mu_{2012} = 0$

$H_{A}$: $\mu_{2008} - \mu_{2012} < 0$

Tetapkan tingkat signifikansi $\alpha = 0.05$.

Datanya berpasangan, karena tiap persentase pemilih mengacu ke county yang sama.

Pengujian Hipotesis di R

Dari dua sampel ke satu

sample_data <- repub_votes_potus_08_12 %>% 
  mutate(diff = repub_percent_08 - repub_percent_12)
ggplot(sample_data, aes(x = diff)) +
  geom_histogram(binwidth = 1)

Histogram variabel diff - sebagian besar nilai antara -10 dan 10 dengan beberapa pencilan.

Pengujian Hipotesis di R

Hitung statistik sampel dari selisih

sample_data %>% 
  summarize(xbar_diff = mean(diff))
  xbar_diff
1 -2.643027
Pengujian Hipotesis di R

Revisi hipotesis

Hipotesis lama

$H_{0}$: $\mu_{2008} - \mu_{2012} = 0$

$H_{A}$: $\mu_{2008} - \mu_{2012} < 0$

 

Hipotesis baru

$H_{0}$: $\mu_{\text{diff}} = 0$

$H_{A}$: $ \mu_{\text{diff}} < 0$

$t = \dfrac{\bar{x}_{\text{diff}} - \mu_{\text{diff}}}{\sqrt{\dfrac{s_{diff}^2}{n_{\text{diff}}}}}$

$df = n_{diff} - 1$

Pengujian Hipotesis di R

Menghitung p-value

n_diff <- nrow(sample_data)
s_diff <- sample_data %>% 
  summarize(sd_diff = sd(diff)) %>%
  pull(sd_diff)
t_stat <- (xbar_diff - 0) / sqrt(s_diff ^ 2 / n_diff)
-16.06374
degrees_of_freedom <- n_diff - 1
499

$t = \dfrac{\bar{x}_{\text{diff}} - \mu_{\text{diff}}}{\sqrt{\dfrac{s_{\text{diff}}^2}{n_{\text{diff}}}}}$

$df = n_{\text{diff}} - 1$

 

p_value <- pt(t_stat, df = degrees_of_freedom)
2.084965e-47
Pengujian Hipotesis di R

Menguji perbedaan dua mean dengan t.test()

t.test(

# Vector of differences sample_data$diff,
# Choose between "two.sided", "less", "greater" alternative = "less",
# Null hypothesis population parameter mu = 0
)
    One Sample t-test

data:  sample_data$diff
t = -16.064, df = 499, p-value < 2.2e-16
alternative hypothesis: true mean is less than 0
95 percent confidence interval:
     -Inf -2.37189
sample estimates:
mean of x 
-2.643027
Pengujian Hipotesis di R

t.test() dengan paired = TRUE

t.test(
  sample_data$repub_percent_08,
  sample_data$repub_percent_12,
  alternative = "less",
  mu = 0,
  paired = TRUE
)
    Paired t-test

data:  sample_data$repub_percent_08 and 
       sample_data$repub_percent_12
t = -16.064, df = 499, p-value < 2.2e-16
alternative hypothesis: true difference in means 
                        is less than 0
95 percent confidence interval:
     -Inf -2.37189
sample estimates:
mean of the differences 
              -2.643027
Pengujian Hipotesis di R

t.test() tidak berpasangan

t.test(
  x = sample_data$repub_percent_08,
  y = sample_data$repub_percent_12,
  alternative = "less",
  mu = 0
)

Uji t tidak berpasangan lebih berisiko gagal mendeteksi efek (daya statistik lebih rendah).

    Welch Two Sample t-test

data:  sample_data$repub_percent_08 and 
       sample_data$repub_percent_12
t = -2.8788, df = 992.76, p-value = 0.002039
alternative hypothesis: true difference in means
                        is less than 0
95 percent confidence interval:
      -Inf -1.131469
sample estimates:
mean of x mean of y 
 56.52034  59.16337 
Pengujian Hipotesis di R

Ayo berlatih!

Pengujian Hipotesis di R

Preparing Video For Download...