Assumptions in hypothesis testing

Hypothesis Testing in Python

James Chapman

Curriculum Manager, DataCamp

Randomness

Assumption

The samples are random subsets of larger populations

Consequence
  • Sample is not representative of population
How to check this
  • Understand how your data was collected
  • Speak to the data collector/domain expert

A logo with the phrase 'Responsibly Sourced Ingredients'.

1 Sampling techniques are discussed in "Sampling in Python".
Hypothesis Testing in Python

Independence of observations

Assumption

Each observation (row) in the dataset is independent

Consequence
  • Increased chance of false negative/positive error
How to check this
  • Understand how our data was collected
Hypothesis Testing in Python

Large sample size

Assumption

The sample is big enough to mitigate uncertainty, so that the Central Limit Theorem applies

Consequence
  • Wider confidence intervals
  • Increased chance of false negative/positive errors
How to check this
  • It depends on the test
Hypothesis Testing in Python

Large sample size: t-test

One sample
  • At least 30 observations in the sample

$n \ge 30$

$n$: sample size

Two samples
  • At least 30 observations in each sample

$n_{1} \ge 30, n_{2} \ge 30$

$n_{i}$: sample size for group $i$

Paired samples
  • At least 30 pairs of observations across the samples

Number of rows in our data $\ge 30$

ANOVA
  • At least 30 observations in each sample

$n_{i} \ge 30$ for all values of $i$

Hypothesis Testing in Python

Large sample size: proportion tests

One sample
  • Number of successes in sample is greater than or equal to 10

$n \times \hat{p} \ge 10$

  • Number of failures in sample is greater than or equal to 10

$n \times (1 - \hat{p}) \ge 10$

$n$: sample size
$\hat{p}$: proportion of successes in sample

Two samples
  • Number of successes in each sample is greater than or equal to 10

$n_{1} \times \hat{p}_{1} \ge 10$

$n_{2} \times \hat{p}_{2} \ge 10$

  • Number of failures in each sample is greater than or equal to 10

$n_{1} \times (1 - \hat{p}_{1}) \ge 10$

$n_{2} \times (1 - \hat{p}_{2}) \ge 10$

Hypothesis Testing in Python

Large sample size: chi-square tests

  • The number of successes in each group in greater than or equal to 5

$n_{i} \times \hat{p}_{i} \ge 5$ for all values of $i$

  • The number of failures in each group in greater than or equal to 5

$n_{i} \times (1 - \hat{p}_{i}) \ge 5$ for all values of $i$

$n_{i}$: sample size for group $i$
$\hat{p}_{i}$: proportion of successes in sample group $i$

Hypothesis Testing in Python

Sanity check

If the bootstrap distribution doesn't look normal, assumptions likely aren't valid

  • Revisit data collection to check for randomness, independence, and sample size
Hypothesis Testing in Python

Let's practice!

Hypothesis Testing in Python

Preparing Video For Download...