Activity of zebrafish and melatonin

Case Studies in Statistical Thinking

Justin Bois

Lecturer, Caltech

Case studies in statistical thinking

Hone and extend your statistical thinking skills
Work with real datasets
Review of Statistical Thinking in Python (Part 1) and (Part 2)

Warming up with zebrafish

¹ Movie courtesy of David Prober, Caltech

Nomenclature

Mutant: Has the mutation on both chromosomes

Wild type: Does not have the mutation

Activity of fish, day and night

¹ Data courtesy of Avni Gandhi, Grigorios Oikonomou, and David Prober, Caltech

Active bouts: a metric for wakefulness

Active bout: A period of time where a fish is consistently active

Active bout length: Number of consecutive minutes with activity

Probability distributions and stories

Probability distribution: A mathematical description of outcomes

A probability distribution has a story

Distributions from Statistical Thinking I

Uniform
Binomial
Poisson
Normal
Exponential

The Exponential distribution

Poisson process: The timing of the next event is completely independent of when the previous event happened

Story of the Exponential distribution: The waiting time between arrivals of a Poisson process is Exponentially distributed

The Exponential CDF

x, y = ecdf(nuclear_incident_times)

_ = plt.plot(x, y, marker='.', linestyle='none')

¹ Data source: Wheatley, Sovacool, and Sornette, Nuclear Events Database

The Exponential CDF

x, y = ecdf(nuclear_incident_times)

_ = plt.plot(x, y, marker='.', linestyle='none')

¹ Data source: Wheatley, Sovacool, and Sornette, Nuclear Events Database

 import dc_stat_think as dcst

 dcst.pearson_r?

 Signature: dcst.pearson_r(data_1, data_2)
 Docstring: Compute the Pearson correlation coefficient between two 
 samples.
 Parameters
 ----------
 data_1 : array_like
     One-dimensional array of data.
 data_2 : array_like
     One-dimensional array of  data.
 Returns
 -------
 output : float
     The Pearson correlation coefficient between `data_1`
     and `data_2`.
 File:      usr/local/lib/python3.5/site-packages/
            dc_stat_think-0.1.4-py3.6.egg/dc_stat_think/dc_stat_think.py
 Type:      function

Using the dc_stat_think module

x, y = dcst.ecdf(nuclear_incident_times)

% pip install dc_stat_think

Let's practice!

Case Studies in Statistical Thinking