Hypothesis Testing in R
Richie Cotton
Data Evangelist at DataCamp
age_first_code_cut
classifies when Stack Overflow user first started programming"adult"
means they started at 14 or older"child"
means they started before 14A hypothesis is a statement about an unknown population parameter.
A hypothesis test is a test of two competing hypotheses.
The null hypothesis ($H_{0}$) is the existing "champion" idea.
The alternative hypothesis ($H_{A}$) is the new "challenger" idea of the researcher.
For our problem
Significance level is "beyond a reasonable doubt" for hypothesis testing.
Hypothesis tests determine whether the sample statistics lie in the tails of the null distribution.
Test | Tails |
---|---|
alternative different from null | two-tailed |
alternative greater than null | right-tailed |
alternative less than null | left-tailed |
$H_{A}$: The proportion of data scientists starting programming as children is greater than 35%.
Our alternative hypothesis uses "greater than," so we need a right-tailed test.
A p-value is
the probability of observing a test statistic
as extreme or more extreme
than what was observed in our original sample,
assuming the null hypothesis is true.
prop_child_samp <- stack_overflow %>%
summarize(point_estimate = mean(age_first_code_cut == "child")) %>%
pull(point_estimate)
0.388
prop_child_hyp <- 0.35
std_error <- 0.0096028
z_score <- (prop_child_samp - prop_child_hyp) / std_error
3.956
pnorm()
is normal CDF.lower.tail = TRUE
.lower.tail = FALSE
.
p_value <- pnorm(z_score, lower.tail = FALSE)
3.818e-05
Hypothesis Testing in R