Niet-parametrische ANOVA en ongepaarde t-toetsen

Hypothesis Testing in R

Richie Cotton

Data Evangelist at DataCamp

Niet-parametrische toetsen

Een niet-parametrische toets is een hypothesetoets die geen verdeling voor de toetsingsgrootheid aanneemt.

Er zijn twee typen niet-parametrische toetsen:

Simulatie-gebaseerd.
Rang-gebaseerd.

t_test()

$H_{0}$: $\mu_{child} - \mu_{adult} = 0$ $H_{A}$: $\mu_{child} - \mu_{adult} > 0$

library(infer)
stack_overflow %>% 
  t_test(
    converted_comp ~ age_first_code_cut,
    order = c("child", "adult"),
    alternative = "greater"
  )

# A tibble: 1 x 6
  statistic  t_df p_value alternative lower_ci upper_ci
      <dbl> <dbl>   <dbl> <chr>          <dbl>    <dbl>
1      2.40 2083. 0.00814 greater        8438.      Inf

De nulverdeling berekenen

Simulatie-gebaseerde workflow

null_distn <- stack_overflow %>% 
  specify(converted_comp ~ age_first_code_cut) %>%

  hypothesize(null = "independence") %>%

  generate(reps = 5000, type = "permute") %>%

  calculate(
    stat = "diff in means", 
    order = c("child", "adult")
  )

t-toets, ter vergelijking

library(infer)
stack_overflow %>% 
  t_test(
    converted_comp ~ age_first_code_cut,
    order = c("child", "adult"),
    alternative = "greater"
  )

De geobserveerde statistiek berekenen

Simulatie-gebaseerde workflow

obs_stat <- stack_overflow %>% 
  specify(converted_comp ~ age_first_code_cut) %>% 
  calculate(
    stat = "diff in means", 
    order = c("child", "adult")
  )

t-toets, ter vergelijking

library(infer)
stack_overflow %>% 
  t_test(
    converted_comp ~ age_first_code_cut,
    order = c("child", "adult"),
    alternative = "greater"
  )

Bepaal de p-waarde

Simulatie-gebaseerde workflow

get_p_value(
  null_distn, obs_stat, 
  direction = "greater"
)

# A tibble: 1 x 1
  p_value
    <dbl>
1  0.0066

t-toets, ter vergelijking

library(infer)
stack_overflow %>% 
  t_test(
    converted_comp ~ age_first_code_cut,
    order = c("child", "adult"),
    alternative = "greater"
  )

# A tibble: 1 x 6
  statistic  t_df p_value alternative lower_ci upper_ci
      <dbl> <dbl>   <dbl> <chr>          <dbl>    <dbl>
1      2.40 2083. 0.00814 greater        8438.      Inf

Rangen van vectoren

x <- c(1, 15, 3, 10, 6)

rank(x)

1 5 2 4 3

Een Wilcoxon-Mann-Whitney-toets (ook wel Wilcoxon rangsomtoets) is (grobweg) een t-toets op de rangen van de numerieke invoer.

Wilcoxon-Mann-Whitney-toets

wilcox.test(
  converted_comp ~ age_first_code_cut,
  data = stack_overflow,
  alternative = "greater",
  correct = FALSE
)

    Wilcoxon rank sum test

data:  converted_comp by age_first_code_cut
W = 967298, p-value <2e-16
alternative hypothesis: true location shift is greater than 0

¹ Ook bekend als de "Wilcoxon rangsomtoets" en de "Mann-Whitney U-toets".

Kruskal-Wallis-toets

De Kruskal-Wallis-toets verhoudt zich tot de Wilcoxon-Mann-Whitney-toets zoals ANOVA tot de t-toets.

kruskal.test(
  converted_comp ~ job_sat,
  data = stack_overflow
)

    Kruskal-Wallis rank sum test

data:  converted_comp by job_sat
Kruskal-Wallis chi-square = 81, df = 4, p-value <2e-16

Laten we oefenen!

Hypothesis Testing in R