Estimating with the t-interval

Inferenza per dati numerici in R

Mine Cetinkaya-Rundel

Associate Professor of the Practice, Duke University

Quantifying variability of sample means

Suppose among a random sample of 100 people 13 are left handed. If you were to select another random sample of 100, would you be surprised if only 12 are left handed? What about 15? Or 30? Or 1 or 90?

Ways to quantify the variability of the sample mean:

  • Simulate with bootstrapping

  • Approximate with Central Limit Theorem

Inferenza per dati numerici in R

Central Limit Theorem

$$ \bar{x} \sim N \left( mean = \mu, SE = \frac{\sigma}{\sqrt{n}} \right) $$

  • SE (standard error) = standard deviation of the sampling distribution
  • $\sigma$ unknown:
    • $SE = \frac{s}{\sqrt{n}}$
    • Use $t_{df = n - 1}$ for inference for a mean
  • Only true if certain conditions are satisfied...
Inferenza per dati numerici in R

Conditions

  1. Independent observations: Hard to check, but...
    • random sampling / assignment
    • if sampling without replacement, n < 10% of population
  2. Sample size / skew: The more skewed the original population, the larger the sample size should be.
Inferenza per dati numerici in R

Confidence interval for a mean

Estimate the average number of days Americans work extra hours beyond their usual schedule (variable: moredays) using data from the 2010 General Social Survey (data: gss).

Inferenza per dati numerici in R

Confidence interval for a mean

Estimate the average number of days Americans work extra hours beyond their usual schedule (variable: moredays) using data from the 2010 General Social Survey (data: gss).

t.test(gss$moredays, conf.level = 0.95)
    One Sample t-test
data:  gss$moredays
t = 25.628, df = 1146, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 5.273367 6.147732
sample estimates:
mean of x 
 5.710549
Inferenza per dati numerici in R

Let's practice!

Inferenza per dati numerici in R

Preparing Video For Download...