Membandingkan metode pengambilan sampel

Sampling di Python

James Chapman

Curriculum Manager, DataCamp

Tinjauan teknik sampling - setup

top_counted_countries = ["Mexico", "Colombia", "Guatemala",
  "Brazil", "Taiwan", "United States (Hawaii)"]

subset_condition = coffee_ratings['country_of_origin'].isin(top_counted_countries)
coffee_ratings_top = coffee_ratings[subset_condition]

coffee_ratings_top.shape

(880, 8)

Tinjauan sampling acak sederhana

coffee_ratings_srs = coffee_ratings_top.sample(frac=1/3, random_state=2021)

coffee_ratings_srs.shape

(293, 8)

Tinjauan sampling berstrata

coffee_ratings_strat = coffee_ratings_top.groupby("country_of_origin")\
    .sample(frac=1/3, random_state=2021)

coffee_ratings_strat.shape

(293, 8)

Tinjauan sampling klaster

import random
top_countries_samp = random.sample(top_counted_countries, k=2)
top_condition = coffee_ratings_top['country_of_origin'].isin(top_countries_samp)
coffee_ratings_cluster = coffee_ratings_top[top_condition]
coffee_ratings_cluster['country_of_origin'] = coffee_ratings_cluster['country_of_origin']\
    .cat.remove_unused_categories()


coffee_ratings_clust = coffee_ratings_cluster.groupby("country_of_origin")\
    .sample(n=len(coffee_ratings_top) // 6)

coffee_ratings_clust.shape

(292, 8)

Menghitung rata-rata cup points

Populasi

coffee_ratings_top['total_cup_points'].mean()

81.94700000000002

Sampel acak sederhana

coffee_ratings_srs['total_cup_points'].mean()

81.95982935153583

Sampel berstrata

coffee_ratings_strat['total_cup_points'].mean()

81.92566552901025

Sampel klaster

coffee_ratings_clust['total_cup_points'].mean()

82.03246575342466

Rata-rata cup points per negara: acak sederhana

Populasi:

coffee_ratings_top.groupby("country_of_origin")\
    ['total_cup_points'].mean()

country_of_origin
Brazil                    82.405909
Colombia                  83.106557
Guatemala                 81.846575
Mexico                    80.890085
Taiwan                    82.001333
United States (Hawaii)    81.820411
Name: total_cup_points, dtype: float64

Sampel acak sederhana:

coffee_ratings_srs.groupby("country_of_origin")\
    ['total_cup_points'].mean()

country_of_origin
Brazil                    82.414878
Colombia                  82.925536
Guatemala                 82.045385
Mexico                    81.100714
Taiwan                    81.744333
United States (Hawaii)    82.008000
Name: total_cup_points, dtype: float64

Rata-rata cup points per negara: berstrata

Populasi:

coffee_ratings_top.groupby("country_of_origin")\
    ['total_cup_points'].mean()

country_of_origin
Brazil                    82.405909
Colombia                  83.106557
Guatemala                 81.846575
Mexico                    80.890085
Taiwan                    82.001333
United States (Hawaii)    81.820411
Name: total_cup_points, dtype: float64

Sampel berstrata:

coffee_ratings_strat.groupby("country_of_origin")\
    ['total_cup_points'].mean()

country_of_origin
Brazil                    82.499773
Colombia                  83.288197
Guatemala                 81.727667
Mexico                    80.994684
Taiwan                    81.846800
United States (Hawaii)    81.051667
Name: total_cup_points, dtype: float64

Rata-rata cup points per negara: klaster

Populasi:

coffee_ratings_top.groupby("country_of_origin")\
    ['total_cup_points'].mean()

country_of_origin
Brazil                    82.405909
Colombia                  83.106557
Guatemala                 81.846575
Mexico                    80.890085
Taiwan                    82.001333
United States (Hawaii)    81.820411
Name: total_cup_points, dtype: float64

Sampel klaster:

coffee_ratings_clust.groupby("country_of_origin")\
    ['total_cup_points'].mean()

country_of_origin
Colombia    83.128904
Mexico      80.936027
Name: total_cup_points, dtype: float64

Ayo berlatih!

Sampling di Python