What are the chances?

Introduction to Statistics in Python

Maggie Matsui

Content Developer, DataCamp

Measuring chance

What's the probability of an event?

$$ P(\text{event}) = \frac{\text{\# ways event can happen}}{\text{total \# of possible outcomes}} $$

Example: a coin flip

$$ P(\text{heads}) = \frac{\text{1 way to get heads}}{\text{2 possible outcomes}} = \frac{1}{2} = 50\%$$

Number line of probability. 0 percent = impossible, 100 percent = will certainly happen

Introduction to Statistics in Python

Assigning salespeople

Box with Amir, Brian, Claire, and Damian's names

Introduction to Statistics in Python

Assigning salespeople

Pulling out Brian's name

$$P(\text{Brian}) = \frac{1}{4} = 25\%$$

Introduction to Statistics in Python

Sampling from a DataFrame

print(sales_counts)
     name  n_sales
0    Amir      178
1   Brian      128
2  Claire       75
3  Damian       69
sales_counts.sample()
    name  n_sales
1  Brian      128
sales_counts.sample()
     name  n_sales
2  Claire       75
Introduction to Statistics in Python

Setting a random seed

np.random.seed(10)

sales_counts.sample()
    name  n_sales
1  Brian      128
np.random.seed(10)
sales_counts.sample()
    name  n_sales
1  Brian      128
np.random.seed(10)
sales_counts.sample()
    name  n_sales
1  Brian      128
Introduction to Statistics in Python

A second meeting

Sampling without replacement

Box with Amir, Claire, Damian

Introduction to Statistics in Python

A second meeting

Claire's name pulled out

$$P(\text{Claire}) = \frac{1}{3} = 33\%$$

Introduction to Statistics in Python

Sampling twice in Python

sales_counts.sample(2)
     name  n_sales
1   Brian      128
2  Claire       75
Introduction to Statistics in Python

Sampling with replacement

GIF of hand reaching into box, pulling out Brian's name, then dropping it back in

Introduction to Statistics in Python

Sampling with replacement

Screen Shot 2020-04-28 at 5.21.54 PM.png

$$P(\text{Claire}) = \frac{1}{4} = 25\%$$

Introduction to Statistics in Python

Sampling with/without replacement in Python

sales_counts.sample(5, replace = True)
     name  n_sales
1   Brian      128
2  Claire       75
1   Brian      128
3  Damian       69
0    Amir      178
Introduction to Statistics in Python

Independent events

Two events are independent if the probability of the second event isn't affected by the outcome of the first event.

Two columns: First pick column containing Amir, Brian, Claire, Damian. Second pick column is empty

Introduction to Statistics in Python

Independent events

Two events are independent if the probability of the second event isn't affected by the outcome of the first event.

 

Sampling with replacement = each pick is independent

Arrows from each name in first pick column point to Claire in second pick column, with probability 25%

Introduction to Statistics in Python

Dependent events

Two events are dependent if the probability of the second event is affected by the outcome of the first event.

Two columns: First pick column containing Amir, Brian, Claire, Damian. Second pick column is empty

Introduction to Statistics in Python

Dependent events

Two events are dependent if the probability of the second event is affected by the outcome of the first event.

Claire in first column points to Claire in second column with probability 0%

Introduction to Statistics in Python

Dependent events

Two events are dependent if the probability of the second event is affected by the outcome of the first event.

 

Sampling without replacement → picks become dependent

Amir, Brian, and Damian in first column points to Claire in second column with probability 33%

Introduction to Statistics in Python

Let's practice!

Introduction to Statistics in Python

Preparing Video For Download...