Simple random and systematic sampling

Sampling in Python

James Chapman

Curriculum Manager, DataCamp

Simple random sampling

A hand picking a folded piece of paper out of a raffle jar.

Lottery balls rolling.

Sampling in Python

Simple random sampling of coffees

Coffee beans arranged in rows and columns.

Coffee beans arranged in rows and columns, some of which are grayed out.

Sampling in Python

Simple random sampling with pandas

coffee_ratings.sample(n=5, random_state=19000113)
     total_cup_points         variety country_of_origin  aroma  flavor  \
437             83.25            None          Colombia   7.92    7.75   
285             83.83  Yellow Bourbon            Brazil   7.92    7.50   
784             82.08            None          Colombia   7.50    7.42   
648             82.58         Caturra          Colombia   7.58    7.50   
155             84.58         Caturra          Colombia   7.42    7.67  

     aftertaste  body  balance  
437        7.25  7.83     7.58  
285        7.33  8.17     7.50  
784        7.42  7.67     7.42  
648        7.42  7.67     7.42  
155        7.75  8.08     7.83 
Sampling in Python

Systematic sampling

Coffee beans arranged in rows and columns.

Coffee beans arranged in rows and columns, most of which are grayed out save for those on a diagonal line.

Sampling in Python

Systematic sampling - defining the interval

sample_size = 5

pop_size = len(coffee_ratings)
print(pop_size)
1338
interval = pop_size // sample_size

print(interval)
267
Sampling in Python

Systematic sampling - selecting the rows

coffee_ratings.iloc[::interval]
      total_cup_points  variety country_of_origin  aroma  flavor  aftertaste  \
0                90.58     None          Ethiopia   8.67    8.83        8.67   
267              83.92     None          Colombia   7.83    7.75        7.58   
534              82.92  Bourbon       El Salvador   7.50    7.50        7.75   
801              82.00   Typica            Taiwan   7.33    7.50        7.17   
1068             80.50    Other            Taiwan   7.17    7.17        7.17   

      body  balance  
0     8.50     8.42  
267   7.75     7.75  
534   7.92     7.83  
801   7.50     7.33  
1068  7.17     7.25  
Sampling in Python

The trouble with systematic sampling

coffee_ratings_with_id = coffee_ratings.reset_index()
coffee_ratings_with_id.plot(x="index", y="aftertaste", kind="scatter")
plt.show()

Scatterplot of aftertaste scores versus indices.

Systematic sampling is only safe if we don't see a pattern in this scatter plot

Sampling in Python

Making systematic sampling safe

shuffled = coffee_ratings.sample(frac=1)

shuffled = shuffled.reset_index(drop=True).reset_index()
shuffled.plot(x="index", y="aftertaste", kind="scatter") plt.show()

Scatterplot of aftertaste scores versus indices after shuffling the dataset.

Shuffling rows + systematic sampling is the same as simple random sampling

Sampling in Python

Let's practice!

Sampling in Python

Preparing Video For Download...