What are the chances?

Introduction to Statistics in R

Maggie Matsui

Content Developer, DataCamp

Measuring chance

What's the probability of an event?

$$ P(\text{event}) = \frac{\text{\# ways event can happen}}{\text{total \# of possible outcomes}} $$

Example: a coin flip

$$ P(\text{heads}) = \frac{\text{1 way to get heads}}{\text{2 possible outcomes}} = \frac{1}{2} = 50\%$$

Number line of probability. 0 percent = impossible, 100 percent = will certainly happen

Introduction to Statistics in R

Assigning salespeople

Box with Amir, Brian, Claire, and Damian's names

Introduction to Statistics in R

Assigning salespeople

Pulling out Brian's name

$$P(\text{Brian}) = \frac{1}{4} = 25\%$$

Introduction to Statistics in R

Sampling from a data frame

sales_counts
   name  n_sales
 1 Amir      178
 2 Brian     126
 3 Claire     75
 4 Damian     69
sales_counts %>%
  sample_n(1)
   name  n_sales
 1 Brian     126
sales_counts %>%
  sample_n(1)
   name  n_sales
 1 Claire     75
Introduction to Statistics in R

Setting a random seed

set.seed(5)

sales_counts %>% sample_n(1)
   name  n_sales
 1 Brian     126
set.seed(5)

sales_counts %>% sample_n(1)
   name  n_sales
 1 Brian     126
Introduction to Statistics in R

A second meeting

Sampling without replacement

Box with Amir, Claire, Damian

Introduction to Statistics in R

A second meeting

Claire's name pulled out

$$P(\text{Claire}) = \frac{1}{3} = 33\%$$

Introduction to Statistics in R

Sampling twice in R

sales_counts %>%
  sample_n(2)
   name  n_sales
 1 Brian     126
 2 Claire     75
Introduction to Statistics in R

Sampling with replacement

GIF of hand reaching into box, pulling out Brian's name, then dropping it back in

Introduction to Statistics in R

Sampling with replacement

Screen Shot 2020-04-28 at 5.21.54 PM.png

$$P(\text{Claire}) = \frac{1}{4} = 25\%$$

Introduction to Statistics in R

Sampling with replacement in R

sales_counts %>%
  sample_n(2, replace = TRUE)
   name  n_sales
 1 Brian     126
 2 Claire     75

5 meetings:

sample(sales_team, 5, replace = TRUE)
   name  n_sales
 1 Brian     126
 2 Claire     75
 3 Brian     126
 4 Brian     126
 5 Amir      178
Introduction to Statistics in R

Independent events

Two events are independent if the probability of the second event isn't affected by the outcome of the first event.

Two columns: First pick column containing Amir, Brian, Claire, Damian. Second pick column is empty

Introduction to Statistics in R

Independent events

Two events are independent if the probability of the second event isn't affected by the outcome of the first event.

 

Sampling with replacement = each pick is independent

Arrows from each name in first pick column point to Claire in second pick column, with probability 25%

Introduction to Statistics in R

Dependent events

Two events are dependent if the probability of the second event is affected by the outcome of the first event.

Two columns: First pick column containing Amir, Brian, Claire, Damian. Second pick column is empty

Introduction to Statistics in R

Dependent events

Two events are dependent if the probability of the second event is affected by the outcome of the first event.

Claire in first column points to Claire in second column with probability 0%

Introduction to Statistics in R

Dependent events

Two events are dependent if the probability of the second event is affected by the outcome of the first event.

 

Sampling without replacement = each pick is dependent

Amir, Brian, and Damian in first column points to Claire in second column with probability 33%

Introduction to Statistics in R

Let's practice!

Introduction to Statistics in R

Preparing Video For Download...