Assumptions in hypothesis testing

Test di ipotesi in Python

James Chapman

Curriculum Manager, DataCamp

Randomness

Assumption

The samples are random subsets of larger populations

Consequence
  • Sample is not representative of population
How to check this
  • Understand how your data was collected
  • Speak to the data collector/domain expert

A logo with the phrase 'Responsibly Sourced Ingredients'.

1 Sampling techniques are discussed in "Sampling in Python".
Test di ipotesi in Python

Independence of observations

Assumption

Each observation (row) in the dataset is independent

Consequence
  • Increased chance of false negative/positive error
How to check this
  • Understand how our data was collected
Test di ipotesi in Python

Large sample size

Assumption

The sample is big enough to mitigate uncertainty, so that the Central Limit Theorem applies

Consequence
  • Wider confidence intervals
  • Increased chance of false negative/positive errors
How to check this
  • It depends on the test
Test di ipotesi in Python

Large sample size: t-test

One sample
  • At least 30 observations in the sample

$n \ge 30$

$n$: sample size

Two samples
  • At least 30 observations in each sample

$n_{1} \ge 30, n_{2} \ge 30$

$n_{i}$: sample size for group $i$

Paired samples
  • At least 30 pairs of observations across the samples

Number of rows in our data $\ge 30$

ANOVA
  • At least 30 observations in each sample

$n_{i} \ge 30$ for all values of $i$

Test di ipotesi in Python

Large sample size: proportion tests

One sample
  • Number of successes in sample is greater than or equal to 10

$n \times \hat{p} \ge 10$

  • Number of failures in sample is greater than or equal to 10

$n \times (1 - \hat{p}) \ge 10$

$n$: sample size
$\hat{p}$: proportion of successes in sample

Two samples
  • Number of successes in each sample is greater than or equal to 10

$n_{1} \times \hat{p}_{1} \ge 10$

$n_{2} \times \hat{p}_{2} \ge 10$

  • Number of failures in each sample is greater than or equal to 10

$n_{1} \times (1 - \hat{p}_{1}) \ge 10$

$n_{2} \times (1 - \hat{p}_{2}) \ge 10$

Test di ipotesi in Python

Large sample size: chi-square tests

  • The number of successes in each group in greater than or equal to 5

$n_{i} \times \hat{p}_{i} \ge 5$ for all values of $i$

  • The number of failures in each group in greater than or equal to 5

$n_{i} \times (1 - \hat{p}_{i}) \ge 5$ for all values of $i$

$n_{i}$: sample size for group $i$
$\hat{p}_{i}$: proportion of successes in sample group $i$

Test di ipotesi in Python

Sanity check

If the bootstrap distribution doesn't look normal, assumptions likely aren't valid

  • Revisit data collection to check for randomness, independence, and sample size
Test di ipotesi in Python

Let's practice!

Test di ipotesi in Python

Preparing Video For Download...