Simple random and systematic sampling

Sampling in Python

James Chapman

Curriculum Manager, DataCamp

Simple random sampling

A hand picking a folded piece of paper out of a raffle jar.

Lottery balls rolling.

Simple random sampling of coffees

Coffee beans arranged in rows and columns.

Coffee beans arranged in rows and columns, some of which are grayed out.

Simple random sampling with pandas

coffee_ratings.sample(n=5, random_state=19000113)

     total_cup_points         variety country_of_origin  aroma  flavor  \
437             83.25            None          Colombia   7.92    7.75   
285             83.83  Yellow Bourbon            Brazil   7.92    7.50   
784             82.08            None          Colombia   7.50    7.42   
648             82.58         Caturra          Colombia   7.58    7.50   
155             84.58         Caturra          Colombia   7.42    7.67  

     aftertaste  body  balance  
437        7.25  7.83     7.58  
285        7.33  8.17     7.50  
784        7.42  7.67     7.42  
648        7.42  7.67     7.42  
155        7.75  8.08     7.83

Systematic sampling

Coffee beans arranged in rows and columns.

Coffee beans arranged in rows and columns, most of which are grayed out save for those on a diagonal line.

Systematic sampling - defining the interval

sample_size = 5

pop_size = len(coffee_ratings)

print(pop_size)

interval = pop_size // sample_size

print(interval)

Systematic sampling - selecting the rows

coffee_ratings.iloc[::interval]

      total_cup_points  variety country_of_origin  aroma  flavor  aftertaste  \
0                90.58     None          Ethiopia   8.67    8.83        8.67   
267              83.92     None          Colombia   7.83    7.75        7.58   
534              82.92  Bourbon       El Salvador   7.50    7.50        7.75   
801              82.00   Typica            Taiwan   7.33    7.50        7.17   
1068             80.50    Other            Taiwan   7.17    7.17        7.17   

      body  balance  
0     8.50     8.42  
267   7.75     7.75  
534   7.92     7.83  
801   7.50     7.33  
1068  7.17     7.25

The trouble with systematic sampling

coffee_ratings_with_id = coffee_ratings.reset_index()
coffee_ratings_with_id.plot(x="index", y="aftertaste", kind="scatter")
plt.show()

Scatterplot of aftertaste scores versus indices.

Systematic sampling is only safe if we don't see a pattern in this scatter plot

Making systematic sampling safe

shuffled = coffee_ratings.sample(frac=1)

shuffled = shuffled.reset_index(drop=True).reset_index()

shuffled.plot(x="index", y="aftertaste", kind="scatter")
plt.show()

Scatterplot of aftertaste scores versus indices after shuffling the dataset.

Shuffling rows + systematic sampling is the same as simple random sampling

Let's practice!

Sampling in Python