Hypothesis Testing in R
Richie Cotton
Data Evangelist at DataCamp
library(infer)
stack_overflow %>%
prop_test(
hobbyist ~ age_cat,
order = c("At least 30", "Under 30"),
alternative = "two-sided",
correct = FALSE
)
# A tibble: 1 x 6
statistic chisq_df p_value alternative lower_ci upper_ci
<dbl> <dbl> <dbl> <chr> <dbl> <dbl>
1 17.8 1 0.0000248 two.sided 0.0605 0.165
Previous hypothesis test result: there is evidence that the hobbyist
and age_cat
variables have an association.
If the proportion of successes in the response variable is the same across all categories of the explanatory variable, the two variables are statistically independent.
stack_overflow %>%
count(age_cat)
# A tibble: 2 x 2
age_cat n
<chr> <int>
1 At least 30 1050
2 Under 30 1211
stack_overflow %>%
count(job_sat)
# A tibble: 5 x 2
job_sat n
<fct> <int>
1 Very dissatisfied 159
2 Slightly dissatisfied 342
3 Neither 201
4 Slightly satisfied 680
5 Very satisfied 879
$H_{0}$: Age categories are independent of job satisfaction levels.
$H_{A}$: Age categories are not independent of job satisfaction levels.
alpha <- 0.1
ggplot(stack_overflow, aes(job_sat, fill = age_cat)) +
geom_bar(position = "fill") +
ylab("proportion")
library(infer)
stack_overflow %>%
chisq_test(age_cat ~ job_sat)
# A tibble: 1 x 3
statistic chisq_df p_value
<dbl> <int> <dbl>
1 5.55 4 0.235
Degrees of freedom:
$(\text{No. of response categories} - 1) \times (\text{No. of explanatory categories} - 1)$
$(2 - 1) * (5 - 1) = 4$
ggplot(stack_overflow, aes(age_cat, fill = job_sat)) +
geom_bar(position = "fill") +
ylab("proportion")
library(infer)
stack_overflow %>%
chisq_test(age_cat ~ job_sat)
# A tibble: 1 x 3
statistic chisq_df p_value
<dbl> <int> <dbl>
1 5.55 4 0.235
Ask
Are the variables X and Y independent?
library(infer)
stack_overflow %>%
chisq_test(job_sat ~ age_cat)
# A tibble: 1 x 3
statistic chisq_df p_value
<dbl> <int> <dbl>
1 5.55 4 0.235
Not
Is variable X independent from variable Y?
args(chisq_test)
function (x, formula, response = NULL, explanatory = NULL, ...)
Hypothesis Testing in R