Bootstrapping

Foundations of Inference in R

Jo Hardin

Instructor

Hypothesis testing

  • How do samples from the null population vary?

  • Statistic, proportion of successes in sample → $\hat{p}$

  • Parameter, proportion of successes in population → ${p}$

Foundations of Inference in R

Confidence intervals

  • No null population, unlike in hypothesis testing
  • How do $p$ and $\hat{p}$ vary?
Foundations of Inference in R

ch4_2_v4.012.png

Foundations of Inference in R

ch4_2_v4.013.png

Foundations of Inference in R

ch4_2_v4.014.png

Foundations of Inference in R

ch4_2_v4.015.png

Foundations of Inference in R

ch4_2_v4.016.png

Foundations of Inference in R

ch4_2_v4.017.png

Foundations of Inference in R

ch4_2_v4.018.png

Foundations of Inference in R

ch4_2_v4.019.png

Foundations of Inference in R

ch4_2_v4.020.png

Foundations of Inference in R

Polling

# Original data
Source: local data frame [30 x 3]
     flip_num  flip
        <int>  <chr>
1          1       H
2          2       H
3          3       H
4          4       T
5          5       H                
6          6       H
# ... with 24 more rows

Original data

Candidate X Total voters Proportion X
17 30 0.5667
Foundations of Inference in R

Polling

# First resample
Source: local data frame [30 x 3]
   replicate flip_num  flip
       <dbl>    <int> <chr>
1          1        7     H
2          1       17     T
3          1       13     H
4          1       14     H
5          1       24     H
6          1       28     T
# ... with 24 more rows

First resample

Candidate X Total voters Proportion X
17 30 0.5667
14 30 0.4667
Foundations of Inference in R

Polling

# Second resample
Source: local data frame [30 x 3]
   replicate flip_num  flip
       <dbl>    <int> <chr>
1          2       21     H
2          2       19     T
3          2       25     H
4          2       24     T
5          2       21     H
6          2       28     T
7          2       13     H
8          2       23     H
9          2       24     T
10         2       24     T
# ... with 20 more rows 

Second resample

Candidate X Total voters Proportion X
17 30 0.5667
14 30 0.4667
18 30 0.6
Foundations of Inference in R

Polling

# Third resample
Source: local data frame [30 x 3]
   replicate flip_num  flip
       <dbl>    <int> <chr>
1          3        6     H
2          3       19     H
3          3        1     H
4          3       24     T
5          3       11     H
6          3       28     T
7          3       16     H
8          3       13     H
9          3       21     T
10         3       29     H
# ... with 20 more rows

Third resample

Candidate X Total voters Proportion X
17 30 0.5667
14 30 0.4667
18 30 0.6
12 30 0.4
Foundations of Inference in R

Standard error

  • Obtained standard error of 0.09 by resampling many times

  • Describes how the statistic varies around parameter

  • Bootstrap provides an approximation of the standard error

Foundations of Inference in R

Variability of p-hat from the population

# Compute p-hat for each poll
ex1_props <- recommend %>% 
    group_by(poll) %>% 
    summarize(prop_yes = 
                mean(vote == "yes"))
# Variability of p-hat
ex1_props %>% 
    summarize(sd(prop_yes))
# A tibble: 1 × 1
  `sd(prop_yes)`
           <dbl>
1     0.08523512
Foundations of Inference in R

Variability of p-hat from the sample (bootstrapping)

# Select one poll from which to resample
one_poll <- all_polls %>%
    filter(poll ==1) %>%
    select(vote)

# Compute p-hat for each resampled poll
ex2_props <- one_poll %>%
    specify(response = vote,
            success = "yes") %>%
    generate(reps = 1000,
            type = "bootstrap")
# Variability of p-hat
ex2_props %>% 
    summarize(sd(stat))
# A tibble: 1 × 1
  `sd(stat)`
           <dbl>
1     0.08691885
Foundations of Inference in R

Let's practice!

Foundations of Inference in R

Preparing Video For Download...