Welcome to the course!

Inference for Numerical Data in R

Mine Cetinkaya-Rundel

Associate Professor of the Practice, Duke University

Rent in Manhattan

On a given day, twenty 1 BR apartments were randomly selected on Craigslist Manhattan from apartments listed as "by owner" (as opposed to by a rental agency).

Is the mean or the median a better measure of typical rent in Manhattan?

chp1-vid1-manhattan-rents

Inference for Numerical Data in R

Bootstrapping techniques

  • Assume the data is representative
  • Pulling oneself up by one's bootstraps
Inference for Numerical Data in R

Observed sample

sample median = $2,350

chp1-vid1-bootsamp-bootpop

Inference for Numerical Data in R

Bootstrap population

chp1-vid1-bootsamp-bootpop

Inference for Numerical Data in R

Bootstrapping scheme

  1. Take a bootstrap sample - a random sample taken with replacement from the original sample, of the same size as the original sample.

  2. Calculate the bootstrap statistic - a statistic such as mean, median, proportion, etc. computed on the bootstrap samples.

  3. Repeat steps (1) and (2) many times to create a bootstrap distribution - a distribution of bootstrap statistics.

Inference for Numerical Data in R

Bootstrapping scheme, in R

library(infer)

___ %>%                               # start with data frame
  specify(response = ___) %>%         # specify the variable of interest
Inference for Numerical Data in R

Bootstrappping scheme, in R

library(infer)

___ %>%                               # start with data frame
  specify(response = ___) %>%         # specify the variable of interest
  generate(reps = ___, type = "bootstrap") %>%  # generate bootstrap samples
Inference for Numerical Data in R

Bootstrapping scheme, in R

library(infer)

___ %>%                               # start with data frame
  specify(response = ___) %>%         # specify the variable of interest
  generate(reps = ___, type = "bootstrap") %>%  # generate bootstrap samples
  calculate(stat = "___")             # calculate bootstrap statistic
Inference for Numerical Data in R

Constructing the bootstrap interval

library(infer)

___ %>%                               # start with data frame
  specify(response = ___) %>%         # specify the variable of interest
  generate(reps = ___, type = "bootstrap") %>%  # generate bootstrap samples
  calculate(stat = "___")             # calculate bootstrap statistic

chp1-vid1-boot-dist-noaxes

Inference for Numerical Data in R

Constructing the bootstrap interval

library(infer)

___ %>%                               # start with data frame
  specify(response = ___) %>%         # specify the variable of interest
  generate(reps = ___, type = "bootstrap") %>%  # generate bootstrap samples
  calculate(stat = "___")             # calculate bootstrap statistic

chp1-vid1-boot-dist-noaxes-parantheses

Inference for Numerical Data in R

Let's practice!

Inference for Numerical Data in R

Preparing Video For Download...