Hypothesis tests

Casestudies in statistisch denken

Justin Bois

Lecturer, Caltech

Effects of mutation on activity

1 Data courtesy of Avni Gandhi, Grigogios Oikonomou, and David Prober, Caltech
Casestudies in statistisch denken

Genotype definitions

  • Wild type: No mutations

  • Heterozygote: Mutation on one of two chromosomes

  • Mutant: Mutation on both chromosomes
Casestudies in statistisch denken

Effects of mutation on activity

1 Data courtesy of Avni Gandhi, Grigogios Oikonomou, and David Prober, Caltech
Casestudies in statistisch denken

Effects of mutation on activity

1 Data courtesy of Avni Gandhi, Grigogios Oikonomou, and David Prober, Caltech
Casestudies in statistisch denken

Hypothesis test

Assessment of how reasonable the observed data are assuming a hypothesis is true

Casestudies in statistisch denken

p-value

The probability of obtaining a value of your test statistic that is at least as extreme as what was observed, under the assumption the null hypothesis is true

Casestudies in statistisch denken

Test statistic

  • A single number that can be computed from observed data and from data you simulate under the null hypothesis

  • Serves as a basis of comparison
Casestudies in statistisch denken

p-value

The probability of obtaining a value of your test statistic that is at least as extreme as what was observed, under the assumption the null hypothesis is true

Requires clear specification of:

  • Null hypothesis that can be simulated
  • Test statistic that can be calculated from observed and simulated data
  • Definition of at least as extreme as
Casestudies in statistisch denken

Pipeline for hypothesis testing

  • Clearly state the null hypothesis
  • Define your test statistic
  • Generate many sets of simulated data assuming the null hypothesis is true
  • Compute the test statistic for each simulated dataset
  • The p-value is the fraction of your simulated datasets for which the test statistic is at least as extreme as for the real data
Casestudies in statistisch denken

Specifying the test

Null hypothesis: the active bout lengths of wild type and heterozygotic fish are identically distributed

Test statistic: Difference in mean active bout length between heterozygotes and wild type

At least as extreme as: Test statistic is greater than or equal to what was observed

Casestudies in statistisch denken

Permutation test

For each replicate:

  • Scramble labels of data points
  • Compute test statistic
perm_reps = dcst.draw_perm_reps(
    data_a, data_b, dcst.diff_of_means, size=10000
)

p-value is the fraction of replicates at least as extreme as what was observed

p_val = np.sum(perm_reps >= diff_means_obs) / len(perm_reps)
Casestudies in statistisch denken

Let's practice!

Casestudies in statistisch denken

Preparing Video For Download...