Hypothesis Testing in R
Richie Cotton
Data Evangelist at DataCamp
A non-parametric test is a hypothesis test that doesn't assume a probability distribution for the test statistic.
There are two types of non-parametric hypothesis test:
$H_{0}$: $\mu_{child} - \mu_{adult} = 0$ $H_{A}$: $\mu_{child} - \mu_{adult} > 0$
library(infer)
stack_overflow %>%
t_test(
converted_comp ~ age_first_code_cut,
order = c("child", "adult"),
alternative = "greater"
)
# A tibble: 1 x 6
statistic t_df p_value alternative lower_ci upper_ci
<dbl> <dbl> <dbl> <chr> <dbl> <dbl>
1 2.40 2083. 0.00814 greater 8438. Inf
null_distn <- stack_overflow %>% specify(converted_comp ~ age_first_code_cut) %>%
hypothesize(null = "independence") %>%
generate(reps = 5000, type = "permute") %>%
calculate( stat = "diff in means", order = c("child", "adult") )
library(infer)
stack_overflow %>%
t_test(
converted_comp ~ age_first_code_cut,
order = c("child", "adult"),
alternative = "greater"
)
obs_stat <- stack_overflow %>%
specify(converted_comp ~ age_first_code_cut) %>%
calculate(
stat = "diff in means",
order = c("child", "adult")
)
library(infer)
stack_overflow %>%
t_test(
converted_comp ~ age_first_code_cut,
order = c("child", "adult"),
alternative = "greater"
)
get_p_value(
null_distn, obs_stat,
direction = "greater"
)
# A tibble: 1 x 1
p_value
<dbl>
1 0.0066
library(infer)
stack_overflow %>%
t_test(
converted_comp ~ age_first_code_cut,
order = c("child", "adult"),
alternative = "greater"
)
# A tibble: 1 x 6
statistic t_df p_value alternative lower_ci upper_ci
<dbl> <dbl> <dbl> <chr> <dbl> <dbl>
1 2.40 2083. 0.00814 greater 8438. Inf
x <- c(1, 15, 3, 10, 6)
rank(x)
1 5 2 4 3
A Wilcoxon-Mann-Whitney test (a.k.a. Wilcoxon rank sum test) is (very roughly) a t-test on the ranks of the numeric input.
wilcox.test(
converted_comp ~ age_first_code_cut,
data = stack_overflow,
alternative = "greater",
correct = FALSE
)
Wilcoxon rank sum test
data: converted_comp by age_first_code_cut
W = 967298, p-value <2e-16
alternative hypothesis: true location shift is greater than 0
Kruskal-Wallis test is to Wilcoxon-Mann-Whitney test as ANOVA is to t-test.
kruskal.test(
converted_comp ~ job_sat,
data = stack_overflow
)
Kruskal-Wallis rank sum test
data: converted_comp by job_sat
Kruskal-Wallis chi-square = 81, df = 4, p-value <2e-16
Hypothesis Testing in R