Hypothesis tests

Case Studies in Statistical Thinking

Justin Bois

Lecturer, Caltech

Effects of mutation on activity

1 Data courtesy of Avni Gandhi, Grigogios Oikonomou, and David Prober, Caltech
Case Studies in Statistical Thinking

Genotype definitions

  • Wild type: No mutations

  • Heterozygote: Mutation on one of two chromosomes

  • Mutant: Mutation on both chromosomes
Case Studies in Statistical Thinking

Effects of mutation on activity

1 Data courtesy of Avni Gandhi, Grigogios Oikonomou, and David Prober, Caltech
Case Studies in Statistical Thinking

Effects of mutation on activity

1 Data courtesy of Avni Gandhi, Grigogios Oikonomou, and David Prober, Caltech
Case Studies in Statistical Thinking

Hypothesis test

Assessment of how reasonable the observed data are assuming a hypothesis is true

Case Studies in Statistical Thinking

p-value

The probability of obtaining a value of your test statistic that is at least as extreme as what was observed, under the assumption the null hypothesis is true

Case Studies in Statistical Thinking

Test statistic

  • A single number that can be computed from observed data and from data you simulate under the null hypothesis

  • Serves as a basis of comparison
Case Studies in Statistical Thinking

p-value

The probability of obtaining a value of your test statistic that is at least as extreme as what was observed, under the assumption the null hypothesis is true

Requires clear specification of:

  • Null hypothesis that can be simulated
  • Test statistic that can be calculated from observed and simulated data
  • Definition of at least as extreme as
Case Studies in Statistical Thinking

Pipeline for hypothesis testing

  • Clearly state the null hypothesis
  • Define your test statistic
  • Generate many sets of simulated data assuming the null hypothesis is true
  • Compute the test statistic for each simulated dataset
  • The p-value is the fraction of your simulated datasets for which the test statistic is at least as extreme as for the real data
Case Studies in Statistical Thinking

Specifying the test

Null hypothesis: the active bout lengths of wild type and heterozygotic fish are identically distributed

Test statistic: Difference in mean active bout length between heterozygotes and wild type

At least as extreme as: Test statistic is greater than or equal to what was observed

Case Studies in Statistical Thinking

Permutation test

For each replicate:

  • Scramble labels of data points
  • Compute test statistic
perm_reps = dcst.draw_perm_reps(
    data_a, data_b, dcst.diff_of_means, size=10000
)

p-value is the fraction of replicates at least as extreme as what was observed

p_val = np.sum(perm_reps >= diff_means_obs) / len(perm_reps)
Case Studies in Statistical Thinking

Let's practice!

Case Studies in Statistical Thinking

Preparing Video For Download...