Pengujian Hipotesis di R
Richie Cotton
Data Evangelist at DataCamp

Kontrol

Perlakuan

library(dplyr)
glimpse(stack_overflow)
Rows: 2,261
Columns: 8
$ respondent <dbl> 36, 47, 69, 125, 147, 152, 166, 170, 187, 196, 221,…
$ age_first_code_cut <chr> "adult", "child", "child", "adult", "adult", "adult…
$ converted_comp <dbl> 77556, 74970, 594539, 2000000, 37816, 121980, 48644…
$ job_sat <fct> Slightly satisfied, Very satisfied, Very satisfied,…
$ purple_link <chr> "Hello, old friend", "Hello, old friend", "Hello, o…
$ age_cat <chr> "At least 30", "At least 30", "Under 30", "At least…
$ age <dbl> 34, 53, 25, 41, 28, 30, 28, 26, 43, 23, 24, 35, 37,…
$ hobbyist <chr> "Yes", "Yes", "Yes", "Yes", "No", "Yes", "Yes", "Ye…
Sebuah hipotesis:
Rata-rata kompensasi tahunan populasi data scientist adalah $110.000.
Estimasi titik (statistik sampel):
mean_comp_samp <- mean(stack_overflow$converted_comp)
mean_comp_samp <- stack_overflow %>%
summarize(mean_compensation = mean(converted_comp)) %>%
pull(mean_compensation)
119574.7
# Langkah 3. Ulangi langkah 1 & 2 berkali-kali
so_boot_distn <- replicate(
n = 5000,
expr = {
# Langkah 1. Resampling
stack_overflow %>%
slice_sample(prop = 1, replace = TRUE) %>%
# Langkah 2. Hitung estimasi titik
summarize(mean_compensation = mean(converted_comp)) %>%
pull(mean_compensation)
}
)
tibble(resample_mean = so_boot_distn) %>%
ggplot(aes(resample_mean)) +
geom_histogram(binwidth = 1000)

std_error <- sd(so_boot_distn)
5511.674
$\text{nilai distandardisasi} = \dfrac{\text{nilai} - \text{mean}}{\text{simpangan baku}}$
$z = \dfrac{\text{stat. sampel} - \text{nilai param. hip.}}{\text{galat baku}}$
$z = \dfrac{\$119,574.7 - \$110,000}{\$5511.67} = 1.737$
mean_comp_samp
119574.7
mean_comp_hyp <- 110000
std_error
5511.674
z_score <- (mean_comp_samp - mean_comp_hyp) / std_error
1.737171
Menentukan apakah statistik sampel dekat atau jauh dari nilai yang diharapkan (atau "dihipotesiskan").
Distribusi normal standar: distribusi normal dengan mean nol, simpangan baku 1.
tibble(x = seq(-4, 4, 0.01)) %>%
ggplot(aes(x)) +
stat_function(fun = dnorm) +
ylab("PDF(x)")

Pengujian Hipotesis di R