Sampling in Python
James Chapman
Curriculum Manager, DataCamp
coffee_ratings.sample(n=5, random_state=19000113)
total_cup_points variety country_of_origin aroma flavor \
437 83.25 None Colombia 7.92 7.75
285 83.83 Yellow Bourbon Brazil 7.92 7.50
784 82.08 None Colombia 7.50 7.42
648 82.58 Caturra Colombia 7.58 7.50
155 84.58 Caturra Colombia 7.42 7.67
aftertaste body balance
437 7.25 7.83 7.58
285 7.33 8.17 7.50
784 7.42 7.67 7.42
648 7.42 7.67 7.42
155 7.75 8.08 7.83
sample_size = 5
pop_size = len(coffee_ratings)
print(pop_size)
1338
interval = pop_size // sample_size
print(interval)
267
coffee_ratings.iloc[::interval]
total_cup_points variety country_of_origin aroma flavor aftertaste \
0 90.58 None Ethiopia 8.67 8.83 8.67
267 83.92 None Colombia 7.83 7.75 7.58
534 82.92 Bourbon El Salvador 7.50 7.50 7.75
801 82.00 Typica Taiwan 7.33 7.50 7.17
1068 80.50 Other Taiwan 7.17 7.17 7.17
body balance
0 8.50 8.42
267 7.75 7.75
534 7.92 7.83
801 7.50 7.33
1068 7.17 7.25
coffee_ratings_with_id = coffee_ratings.reset_index()
coffee_ratings_with_id.plot(x="index", y="aftertaste", kind="scatter")
plt.show()
Systematic sampling is only safe if we don't see a pattern in this scatter plot
shuffled = coffee_ratings.sample(frac=1)
shuffled = shuffled.reset_index(drop=True).reset_index()
shuffled.plot(x="index", y="aftertaste", kind="scatter") plt.show()
Shuffling rows + systematic sampling is the same as simple random sampling
Sampling in Python