Comparing means with a t-test

Inference for Numerical Data in R

Mine Cetinkaya-Rundel

Associate Professor of the Practice, Duke University

A (more) standard measure of pay

Instead of comparing average annual income, compare average hrly_rate:

  • assume 52 weeks in a year
  • hrly_rate = income / (hrs_work * 52)
Inference for Numerical Data in R

Research question and hypotheses

Do the data provide convincing evidence of a difference between the average hourly rate of citizens and non-citizens in the US?

Let $\mu = $ average hourly pay

$H_0: \mu_{citizen} = \mu_{non-citizen}$

$H_A: \mu_{citizen} \ne \mu_{non-citizen}$

Inference for Numerical Data in R

Summary statistics

acs12 %>%
  filter(!is.na(hrly_rate)) %>%
  group_by(citizen) %>%
  summarise(x_bar = round(mean(hrly_rate), 2),
            s = round(sd(hrly_rate), 2),
            n = length(hrly_rate)) 
  citizen  x_bar      s   n
1 no       21.19  34.50  58
2 yes      18.52  24.73 901
Inference for Numerical Data in R

Conducting the test

t.test(hrly_rate ~ citizen, data = acs12, null = 0, 
       alternative = "two.sided")
  • Null:
    • $H_0: \mu_{citizen} = \mu_{non-citizen}$
    • $H_0: \mu_{citizen} - \mu_{non-citizen} = 0$ $\rightarrow$ null = 0
  • $H_A: \mu_{citizen} \ne \mu_{non-citizen}$ $\rightarrow$ alternative = "two.sided"
Inference for Numerical Data in R

Conducting the test

t.test(hrly_rate ~ citizen, data = acs12, null = 0, 
       alternative = "two.sided")
    Welch Two Sample t-test
data:  hrly_rate by citizen
t = 0.58058, df = 60.827, p-value = 0.5637
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -6.53483 11.88170
sample estimates:
 mean in group no mean in group yes 
         21.19494          18.52151 
Inference for Numerical Data in R

Conditions

  • Independence:
    • Observations in each sample should be independent of each other.
    • The two samples should be independent of each other.
  • Sample size / skew: The more skewed the original data, the higher the sample size required to have a symmetric sampling distribution.

chp3-vid3-hrly-rate-citizen

Inference for Numerical Data in R

Let's practice!

Inference for Numerical Data in R

Preparing Video For Download...