Niet-parametrische ANOVA en ongepaarde t-toetsen

Hypothesis Testing in R

Richie Cotton

Data Evangelist at DataCamp

Niet-parametrische toetsen

Een niet-parametrische toets is een hypothesetoets die geen verdeling voor de toetsingsgrootheid aanneemt.

Er zijn twee typen niet-parametrische toetsen:

  1. Simulatie-gebaseerd.
  2. Rang-gebaseerd.
Hypothesis Testing in R

t_test()

$H_{0}$: $\mu_{child} - \mu_{adult} = 0$     $H_{A}$: $\mu_{child} - \mu_{adult} > 0$

library(infer)
stack_overflow %>% 
  t_test(
    converted_comp ~ age_first_code_cut,
    order = c("child", "adult"),
    alternative = "greater"
  )
# A tibble: 1 x 6
  statistic  t_df p_value alternative lower_ci upper_ci
      <dbl> <dbl>   <dbl> <chr>          <dbl>    <dbl>
1      2.40 2083. 0.00814 greater        8438.      Inf
Hypothesis Testing in R

De nulverdeling berekenen

Simulatie-gebaseerde workflow
null_distn <- stack_overflow %>% 
  specify(converted_comp ~ age_first_code_cut) %>%

hypothesize(null = "independence") %>%
generate(reps = 5000, type = "permute") %>%
calculate( stat = "diff in means", order = c("child", "adult") )
t-toets, ter vergelijking
library(infer)
stack_overflow %>% 
  t_test(
    converted_comp ~ age_first_code_cut,
    order = c("child", "adult"),
    alternative = "greater"
  )
Hypothesis Testing in R

De geobserveerde statistiek berekenen

Simulatie-gebaseerde workflow
obs_stat <- stack_overflow %>% 
  specify(converted_comp ~ age_first_code_cut) %>% 
  calculate(
    stat = "diff in means", 
    order = c("child", "adult")
  )
t-toets, ter vergelijking
library(infer)
stack_overflow %>% 
  t_test(
    converted_comp ~ age_first_code_cut,
    order = c("child", "adult"),
    alternative = "greater"
  )
Hypothesis Testing in R

Bepaal de p-waarde

Simulatie-gebaseerde workflow
get_p_value(
  null_distn, obs_stat, 
  direction = "greater"
)
# A tibble: 1 x 1
  p_value
    <dbl>
1  0.0066
t-toets, ter vergelijking
library(infer)
stack_overflow %>% 
  t_test(
    converted_comp ~ age_first_code_cut,
    order = c("child", "adult"),
    alternative = "greater"
  )
# A tibble: 1 x 6
  statistic  t_df p_value alternative lower_ci upper_ci
      <dbl> <dbl>   <dbl> <chr>          <dbl>    <dbl>
1      2.40 2083. 0.00814 greater        8438.      Inf
Hypothesis Testing in R

Rangen van vectoren

x <- c(1, 15, 3, 10, 6)
rank(x)
1 5 2 4 3

Een Wilcoxon-Mann-Whitney-toets (ook wel Wilcoxon rangsomtoets) is (grobweg) een t-toets op de rangen van de numerieke invoer.

Hypothesis Testing in R

Wilcoxon-Mann-Whitney-toets

wilcox.test(
  converted_comp ~ age_first_code_cut,
  data = stack_overflow,
  alternative = "greater",
  correct = FALSE
) 
    Wilcoxon rank sum test

data:  converted_comp by age_first_code_cut
W = 967298, p-value <2e-16
alternative hypothesis: true location shift is greater than 0
1 Ook bekend als de "Wilcoxon rangsomtoets" en de "Mann-Whitney U-toets".
Hypothesis Testing in R

Kruskal-Wallis-toets

De Kruskal-Wallis-toets verhoudt zich tot de Wilcoxon-Mann-Whitney-toets zoals ANOVA tot de t-toets.

kruskal.test(
  converted_comp ~ job_sat,
  data = stack_overflow
)
    Kruskal-Wallis rank sum test

data:  converted_comp by job_sat
Kruskal-Wallis chi-square = 81, df = 4, p-value <2e-16
Hypothesis Testing in R

Laten we oefenen!

Hypothesis Testing in R

Preparing Video For Download...