Estimating with the t-interval

Inference for Numerical Data in R

Mine Cetinkaya-Rundel

Associate Professor of the Practice, Duke University

Quantifying variability of sample means

Suppose among a random sample of 100 people 13 are left handed. If you were to select another random sample of 100, would you be surprised if only 12 are left handed? What about 15? Or 30? Or 1 or 90?

Ways to quantify the variability of the sample mean:

Simulate with bootstrapping
Approximate with Central Limit Theorem

Central Limit Theorem

$$ \bar{x} \sim N \left( mean = \mu, SE = \frac{\sigma}{\sqrt{n}} \right) $$

SE (standard error) = standard deviation of the sampling distribution
$\sigma$ unknown:
- $SE = \frac{s}{\sqrt{n}}$
- Use $t_{df = n - 1}$ for inference for a mean
Only true if certain conditions are satisfied...

Conditions

Independent observations: Hard to check, but...
- random sampling / assignment
- if sampling without replacement, n < 10% of population
Sample size / skew: The more skewed the original population, the larger the sample size should be.

Confidence interval for a mean

Estimate the average number of days Americans work extra hours beyond their usual schedule (variable: moredays) using data from the 2010 General Social Survey (data: gss).

Confidence interval for a mean

Estimate the average number of days Americans work extra hours beyond their usual schedule (variable: moredays) using data from the 2010 General Social Survey (data: gss).

t.test(gss$moredays, conf.level = 0.95)

    One Sample t-test
data:  gss$moredays
t = 25.628, df = 1146, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 5.273367 6.147732
sample estimates:
mean of x 
 5.710549

Let's practice!

Inference for Numerical Data in R