Inference for Numerical Data in R
Mine Cetinkaya-Rundel
Associate Professor of the Practice, Duke University
$H_0$: The average vocabulary score is the same across all social classes, $\mu_{lower} = \mu_{working} = \mu_{middle} = \mu_{upper}$.
$H_A$: The average vocabulary scores differ between at least one pair of social classes.
Total variability in vocabulary score:
library(broom)
aov(wordsum ~ class, gss) %>%
tidy()
term df sumsq meansq statistic p.value
class 3 236.5644 78.854810 21.73467 0
Residuals 791 2869.8003 3.628066 NA NA
term df sumsq meansq statistic p.value
class 3 236.5644 78.854810 21.73467 0
Residuals 791 2869.8003 3.628066 NA NA
term df sumsq meansq statistic p.value
class 3 236.5644 78.854810 21.73467 0
Residuals 791 2869.8003 3.628066 NA NA
F-statistic = 21.73467 = $\frac{between~group~var}{within~group~var}$
Inference for Numerical Data in R