Cluster sampling

Sampling in R

Richie Cotton

Data Evangelist at DataCamp

Stratified sampling vs. cluster sampling

Stratified sampling

  • Split the population into subgroups
  • Use simple random sampling on every subgroup

Cluster sampling

  • Use simple random sampling to pick some subgroups
  • Use simple random sampling on only those subgroups
Sampling in R

Varieties of coffee

Coffee beans arranged in rows and columns.

varieties_pop <- unique(
  coffee_ratings$variety
)
 [1] "Bourbon"              
 [2] "Catimor"              
 [3] "Ethiopian Yirgacheffe"
 [4] "Caturra"              
 [5] "SL14"  
...
[27] "Marigojipe"           
[28] "Pache Comun"
Sampling in R

Stage 1: sampling for subgroups

Coffee beans arranged in rows and columns, all of which are grayed out save for three.

varieties_samp <- sample(
  varieties_pop, 
  size = 3
)
"Sumatra"       "Blue Mountain" "SL28"
Sampling in R

Stage 2: sampling each group

coffee_ratings %>% 
  filter(variety %in% varieties_samp) %>% 
  group_by(variety) %>% 
  slice_sample(n = 5) %>% 
  ungroup()
Sampling in R

Stage 2 output

# A tibble: 10 x 8
   total_cup_points variety       country_of_origin aroma flavor aftertaste  body balance
              <dbl> <chr>         <chr>             <dbl>  <dbl>      <dbl> <dbl>   <dbl>
 1             81.5 Blue Mountain Haiti              7.42   7.33       7.25  7.42    7.33
 2             82.7 Blue Mountain Mexico             7.75   7.58       7.25  7.67    7.58
 3             84.5 SL28          Kenya              7.92   7.83       7.67  7.67    7.75
 4             81.9 SL28          Zambia             7.67   7.08       7.42  7.75    7.42
 5             84.7 SL28          Kenya              7.75   7.92       7.83  7.58    7.75
 6             85.5 SL28          Kenya              7.92   7.92       7.83  7.83    7.92
 7             83.8 SL28          Kenya              7.75   7.58       7.5   7.75    7.75
 8             86.6 Sumatra       Taiwan             8      8          8     8       8.17
 9             81.7 Sumatra       Indonesia          7.17   7.42       7.33  7.33    7.42
10             83.5 Sumatra       Indonesia          7.25   7.67       7.58  7.83    7.58
Sampling in R

Multistage sampling

  • Cluster sampling is a type of multistage sampling.
  • You can have > 2 stages.
  • Countrywide surveys may sample states, counties, cities, and neighborhoods.
Sampling in R

Let's practice!

Sampling in R

Preparing Video For Download...