Inference for Numerical Data in R
Mine Cetinkaya-Rundel
Associate Professor of the Practice, Duke University
Motivating question: Does a treatment using embryonic stem cells help improve heart function following a heart attack more so than traditional therapy?
Data: stem.cell data from the openintro package
library(openintro)
data(stem.cell)
trmt before after
1 ctrl 35.25 29.50
2 ctrl 36.50 29.50
3 ctrl 39.75 36.25
... ... ...
n esc 53.75 51.00
Step 1. Calculate change for each sheep: difference between before and after heart pumping capacities for each sheep.
trmt before after change
1 ctrl 35.25 29.50 ?
2 ctrl 36.50 29.50 ?
3 ctrl 39.75 36.25 ?
... ... ...
n esc 53.75 51.00 ?
Step 2. Set the hypotheses:
$H_0: \mu_{esc} = \mu_{ctrl}$; There is no difference between average change in treatment and control groups.
$H_A: \mu_{esc} > \mu_{ctrl}$; There is a difference between average change in treatment and control groups.
Step 3. Conduct the hypothesis test.
change on 18 index cards.change between treatment and control.Use the infer package to conduct the test:
library(infer)
Start with the data frame and specify the model:
library(infer)
diff_ht_mean <- stem.cell %>%
specify(__) %>% # y ~ x
...
Declare null hypothesis, i.e. no difference between means:
library(infer)
diff_ht_mean <- stem.cell %>%
specify(__) %>% # y ~ x
hypothesize(null = __) %>% # "independence" or "point"
...
Generate resamples assuming $H_0$ is true:
library(infer)
diff_ht_mean <- stem.cell %>%
specify(__) %>% # y ~ x
hypothesize(null = __) %>% # "independence" or "point"
generate(reps = __, type = __) %>% # "bootstrap", "permute", or "simulate"
...
Calculate test statistic:
library(infer)
diff_ht_mean <- stem.cell %>%
specify(__) %>% # y ~ x
hypothesize(null = __) %>% # "independence" or "point"
generate(reps = _N_, type = __) %>%# "bootstrap", "permute", or "simulate"
calculate(stat = "diff in means") # type of statistic to calculate
Calculate the p-value as the proportion of simulations where the simulated difference between the sample means is at least as extreme as the observed
$$P ((\bar{x}_{esc,sim} - \bar{x}_{ctrl,sim}) \ge (\bar{x}_{esc,obs} - \bar{x}_{ctrl,obs}))$$
Inference for Numerical Data in R