Inference for Numerical Data in R
Mine Cetinkaya-Rundel
Associate Professor of the Practice, Duke University
200 observations were randomly sampled from the High School and Beyond survey.The same students took a reading and writing test. At a first glance, how are the distributions of reading and writing scores similar? How are they different?
Can reading and writing scores for a given student student assumed to be independent of each other?
Probably not!
When two sets of observations have this special correspondence (not independent), they are said to be paired.
To analyze paired data, it is often useful to look at the difference in outcomes of each pair of observations:
diff = read ? write
.
student | read | write | diff |
---|---|---|---|
1 | 57 | 52 | 5 |
2 | 68 | 59 | 9 |
3 | 44 | 33 | 11 |
... | ... | ... | ... |
200 | 63 | 65 | -2 |
Construct a 95% confidence interval for the mean difference between the average reading and writing scores.
Construct a 95% confidence interval for the mean difference between the average reading and writing scores.
t.test(hsb2$diff, conf.level = 0.95)
Construct a 95% confidence interval for the mean difference between the average reading and writing scores.
t.test(hsb2$diff, conf.level = 0.95)
One Sample t-test
data: hsb2$diff
t = -0.86731, df = 199, p-value = 0.3868
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
-1.7841424 0.6941424
sample estimates:
mean of x
-0.545
95% CI for the mean difference in reading and writing scores (read - write) is (-1.78, 0.69)
vs.
We are 95% confident that the average reading score is 1.78 points lower to 0.69 points higher than the average writing score.
Inference for Numerical Data in R