Discrete distributions

Introduction to Statistics in R

Maggie Matsui

Content Developer, DataCamp

Rolling the dice

Six-sided die

Introduction to Statistics in R

Rolling the dice

Each side of a die has 1/6 probability

Introduction to Statistics in R

Choosing salespeople

 

Names in a box, each with 25% probability

Introduction to Statistics in R

Probability distribution

Describes the probability of each possible outcome in a scenario

Each side of a die has 1/6 probability

 

Expected value: mean of a probability distribution

Expected value of a fair die roll = $(1 \times \frac{1}{6}) + (2 \times \frac{1}{6}) +(3 \times \frac{1}{6}) +(4 \times \frac{1}{6}) +(5 \times \frac{1}{6}) +(6 \times \frac{1}{6}) = 3.5$

Introduction to Statistics in R

Visualizing a probability distribution

Bar plot with a bar for each number 1 through six, with height 1/6.

Introduction to Statistics in R

Probability = area

$$P(\text{die roll}) \le 2 = ~?$$

Bars for 1 and 2 highlighted

Introduction to Statistics in R

Probability = area

$$P(\text{die roll}) \le 2 = 1/3$$

1/6 + 1/6 = 1/3

Introduction to Statistics in R

Uneven die

six-sided die with two sides with 3 dots

Expected value of uneven die roll = $(1 \times \frac{1}{6}) +(2 \times 0) +(3 \times \frac{1}{3}) +(4 \times \frac{1}{6}) +(5 \times \frac{1}{6}) +(6 \times \frac{1}{6}) = 3.67$

Introduction to Statistics in R

Visualizing uneven probabilities

Probability distribution of uneven die. Bars for 1, 4, 5, 6 are height 1/6, bar for 2 is height 0, bar for 3 is height 1/3

Introduction to Statistics in R

Adding areas

$$P(\text{uneven die roll}) \le 2 = ~?$$

1/6 + 0

Introduction to Statistics in R

Adding areas

$$P(\text{uneven die roll}) \le 2 = 1/6$$

1/6 + 0

Introduction to Statistics in R

Discrete probability distributions

Describe probabilities for discrete outcomes

Fair die

die_plot.png

                           Discrete uniform distribution

 

Uneven die

uneven_die.png

Introduction to Statistics in R

Sampling from discrete distributions

die
   n
1  1
2  2
3  3
4  4
5  5
6  6
mean(die$n)
3.5
rolls_10 <- die %>%
  sample_n(10, replace = TRUE)
rolls_10
   n
1  1
2  1
3  5
4  2
5  1
6  1
7  6
8  6
...
Introduction to Statistics in R

Visualizing a sample

ggplot(rolls_10, aes(n)) +
  geom_histogram(bins = 6)

histogram of 10 rolls

Introduction to Statistics in R

Sample distribution vs. theoretical distribution

 

Sample of 10 rolls

histogram of 10 rolls

mean(rolls_10$n) = 3.0

 

Theoretical probability distribution

probability distribution of fair die

mean(die$n) = 3.5

Introduction to Statistics in R

A bigger sample

 

Sample of 100 rolls

histogram of 100 rolls

mean(rolls_100$n) = 3.36

 

Theoretical probability distribution

probability distribution of fair die

mean(die$n) = 3.5

Introduction to Statistics in R

An even bigger sample

 

Sample of 1000 rolls

histogram of 1000 rolls

mean(rolls_1000$n) = 3.53

 

Theoretical probability distribution

probability distribution of fair die

mean(die$n) = 3.5

Introduction to Statistics in R

Law of large numbers

As the size of your sample increases, the sample mean will approach the expected value.

Sample size Mean
10 3.00
100 3.36
1000 3.53
Introduction to Statistics in R

Let's practice!

Introduction to Statistics in R

Preparing Video For Download...