Cluster sampling

Sampling in R

Richie Cotton

Data Evangelist at DataCamp

Stratified sampling vs. cluster sampling

Stratified sampling

Split the population into subgroups
Use simple random sampling on every subgroup

Cluster sampling

Use simple random sampling to pick some subgroups
Use simple random sampling on only those subgroups

Varieties of coffee

Coffee beans arranged in rows and columns.

varieties_pop <- unique(
  coffee_ratings$variety
)

 [1] "Bourbon"              
 [2] "Catimor"              
 [3] "Ethiopian Yirgacheffe"
 [4] "Caturra"              
 [5] "SL14"  
...
[27] "Marigojipe"           
[28] "Pache Comun"

Stage 1: sampling for subgroups

Coffee beans arranged in rows and columns, all of which are grayed out save for three.

varieties_samp <- sample(
  varieties_pop, 
  size = 3
)

"Sumatra"       "Blue Mountain" "SL28"

Stage 2: sampling each group

coffee_ratings %>% 
  filter(variety %in% varieties_samp) %>% 
  group_by(variety) %>% 
  slice_sample(n = 5) %>% 
  ungroup()

Stage 2 output

# A tibble: 10 x 8
   total_cup_points variety       country_of_origin aroma flavor aftertaste  body balance
              <dbl> <chr>         <chr>             <dbl>  <dbl>      <dbl> <dbl>   <dbl>
 1             81.5 Blue Mountain Haiti              7.42   7.33       7.25  7.42    7.33
 2             82.7 Blue Mountain Mexico             7.75   7.58       7.25  7.67    7.58
 3             84.5 SL28          Kenya              7.92   7.83       7.67  7.67    7.75
 4             81.9 SL28          Zambia             7.67   7.08       7.42  7.75    7.42
 5             84.7 SL28          Kenya              7.75   7.92       7.83  7.58    7.75
 6             85.5 SL28          Kenya              7.92   7.92       7.83  7.83    7.92
 7             83.8 SL28          Kenya              7.75   7.58       7.5   7.75    7.75
 8             86.6 Sumatra       Taiwan             8      8          8     8       8.17
 9             81.7 Sumatra       Indonesia          7.17   7.42       7.33  7.33    7.42
10             83.5 Sumatra       Indonesia          7.25   7.67       7.58  7.83    7.58

Multistage sampling

Cluster sampling is a type of multistage sampling.
You can have > 2 stages.
Countrywide surveys may sample states, counties, cities, and neighborhoods.

Let's practice!

Sampling in R