Sampling in Python
James Chapman
Curriculum Manager, DataCamp




coffee_ratings.sample(n=5, random_state=19000113)
     total_cup_points         variety country_of_origin  aroma  flavor  \
437             83.25            None          Colombia   7.92    7.75   
285             83.83  Yellow Bourbon            Brazil   7.92    7.50   
784             82.08            None          Colombia   7.50    7.42   
648             82.58         Caturra          Colombia   7.58    7.50   
155             84.58         Caturra          Colombia   7.42    7.67  
     aftertaste  body  balance  
437        7.25  7.83     7.58  
285        7.33  8.17     7.50  
784        7.42  7.67     7.42  
648        7.42  7.67     7.42  
155        7.75  8.08     7.83 
  

sample_size = 5pop_size = len(coffee_ratings)print(pop_size)
1338
interval = pop_size // sample_sizeprint(interval)
267
  coffee_ratings.iloc[::interval]
      total_cup_points  variety country_of_origin  aroma  flavor  aftertaste  \
0                90.58     None          Ethiopia   8.67    8.83        8.67   
267              83.92     None          Colombia   7.83    7.75        7.58   
534              82.92  Bourbon       El Salvador   7.50    7.50        7.75   
801              82.00   Typica            Taiwan   7.33    7.50        7.17   
1068             80.50    Other            Taiwan   7.17    7.17        7.17   
      body  balance  
0     8.50     8.42  
267   7.75     7.75  
534   7.92     7.83  
801   7.50     7.33  
1068  7.17     7.25  
  coffee_ratings_with_id = coffee_ratings.reset_index()
coffee_ratings_with_id.plot(x="index", y="aftertaste", kind="scatter")
plt.show()

Systematic sampling is only safe if we don't see a pattern in this scatter plot
shuffled = coffee_ratings.sample(frac=1)shuffled = shuffled.reset_index(drop=True).reset_index()shuffled.plot(x="index", y="aftertaste", kind="scatter") plt.show()

Shuffling rows + systematic sampling is the same as simple random sampling
Sampling in Python