p-values

Hypothesis Testing in R

Richie Cotton

Data Evangelist at DataCamp

Criminal trials

Two possible true states.
1. Defendant committed the crime.
2. Defendant did not commit the crime.
Two possible verdicts.
1. Guilty.
2. Not guilty.
Initially the defendant is assumed to be not guilty.
If the evidence is "beyond a reasonable doubt" that the defendant committed the crime, then a "guilty" verdict is given, else a "not guilty" verdict is given.

age_first_code_cut classifies when Stack Overflow user first started programming
1. "adult" means they started at 14 or older
2. "child" means they started before 14
Previous research suggests that 35% of software developers started programming as children
Does our sample provide evidence that data scientists have a greater proportion starting programming as a child?

A hypothesis is a statement about an unknown population parameter.

A hypothesis test is a test of two competing hypotheses.

The null hypothesis ($H_{0}$) is the existing "champion" idea.
The alternative hypothesis ($H_{A}$) is the new "challenger" idea of the researcher.

For our problem

$H_{0}$: The proportion of data scientists starting programming as children is the same as that of software developers (35%).
$H_{A}$: The proportion of data scientists starting programming as children is greater than 35%.

¹ "Naught" is British English for "zero". For historical reasons, "H-naught" is the international convention for pronouncing the null hypothesis.

Two possible true states.
1. Defendant committed the crime.
2. Defendant did not commit the crime.
Two possible verdicts.
1. Guilty.
2. Not guilty.
Initially the defendant is assumed to be not guilty.
If the evidence is "beyond a reasonable doubt" that the defendant committed the crime, then a "guilty" verdict is given, else a "not guilty" verdict is given.

In reality, either $H_{A}$ or $H_{0}$ is true (but not both).
The test ends in either "reject $H_{0}$" verdict or "fail to reject $H_{0}$".
Initially the null hypothesis, $H_{0}$, is assumed to be true.
If the evidence from the sample is "significant" that $H_{A}$ is true, choose that hypothesis, else choose $H_{0}$ .

Significance level is "beyond a reasonable doubt" for hypothesis testing.

Density plot of the pdf of the standard normal distribution with the middle part covered up, showing only the tails.

Hypothesis tests determine whether the sample statistics lie in the tails of the null distribution.

Test	Tails
alternative different from null	two-tailed
alternative greater than null	right-tailed
alternative less than null	left-tailed

$H_{A}$: The proportion of data scientists starting programming as children is greater than 35%.

Our alternative hypothesis uses "greater than," so we need a right-tailed test.

The larger the p-value, the stronger the support for $H_{0}$.
The smaller the p-value, the stronger the evidence against $H_{0}$.
Small p-values mean the statistic is in the tail of the null distribution (the distribution of the statistic if the null hypothesis was true).
- The "p" in p-value stands for probability.
- For p-values, "small" means "close to zero".

A p-value is

the probability of observing a test statistic

as extreme or more extreme

than what was observed in our original sample,

assuming the null hypothesis is true.

prop_child_samp <- stack_overflow %>%
  summarize(point_estimate = mean(age_first_code_cut == "child")) %>%
  pull(point_estimate)

0.388

prop_child_hyp <- 0.35

std_error <- 0.0096028

z_score <- (prop_child_samp - prop_child_hyp) / std_error

3.956

p_value <- pnorm(z_score, lower.tail = FALSE)

3.818e-05

Hypothesis Testing in R