Random number generators and hacker statistics

Statistical Thinking in Python (Part 1)

Justin Bois

Teaching Professor at the California Institute of Technology

Hacker statistics

Uses simulated repeated measurements to compute probabilities.

ch3-2.004.png

¹ Image: artist unknown

ch3-2.005.png

¹ Image: Heritage Auction

Simulating coin flips

ch3-2.010.png

Bernoulli trials

An experiment that has two options, "success" (True) and "failure" (False).

The np.random module

import numpy as np
rng = np.random.default_rng()

rng

Generator(PCG64) at 0x7F9433D38120

Random number seed

Integer fed into random number generating algorithm
Manually seed random number generator (only) if you need reproducibility
Specified using rng = np.random.default_rng(seed)

Simulating 4 coin flips

rng = np.random.default_rng(42)

random_numbers = rng.random(size=4)
random_numbers

array([0.77395605, 0.43887844, 0.85859792, 0.69736803])

heads = random_numbers < 0.5
heads

array([False,  True, False, False])

np.sum(heads)

Simulating 4 coin flips

n_all_heads = 0  # Initialize number of 4-heads trials
for _ in range(10000):
     heads = np.random.random(size=4) < 0.5
     n_heads = np.sum(heads)
     if n_heads == 4:
         n_all_heads += 1

n_all_heads / 10000

0.0607

Hacker stats probabilities

Determine how to simulate data
Simulate many many times
Probability is approximately fraction of trials with the outcome of interest

Let's practice!

Statistical Thinking in Python (Part 1)