Statistik ringkas per grup

Manipulasi Data di Julia

Katerina Zahradova

Instructor

Apa yang kini kita ketahui

# Hitung rata-rata upah minimum
combine(wages, :effective_min_wage_2020_dollars => mean)
1x1 DataFrame
Row | effective_min_wage_2020_dollars_mean
    | Float64
____|_______________________
   1| 8.37093
Manipulasi Data di Julia

Apa yang kini kita ketahui

# Saring dan hitung rata-rata
first(combine(filter(r -> r.region =="W", wages), :effective_min_wage_2020_dollars => mean))
first(combine(filter(r -> r.region =="S", wages), :effective_min_wage_2020_dollars => mean))
first(combine(filter(r -> r.region =="NE", wages), :effective_min_wage_2020_dollars => mean))
DataFrameRow (1 columns)
Row | effective_min_wage_2020_dollars_mean
    | Float64
____|_______________________
   1| 8.75413
...
Manipulasi Data di Julia

Menggunakan combine() dan groupby()

# Kelompokkan berdasarkan region
wages_by_region = groupby(wages, :region)

# Hitung rata-rata per grup combine(wages_by_region, :effective_min_wage_2020_dollars => mean)
4x2 DataFrame
Row region    effective_min_wage_2020_dollars_mean
    String    Float64
__________________________
1    S        8.15458
2    W        8.59119
3    NE       8.75413
4    MW       8.1514
Manipulasi Data di Julia

Menggabungkan combine() dan groupby()

# Gabungkan keduanya
combine(groupby(wages, :region), :effective_min_wage_2020_dollars => mean)
4x2 DataFrame
Row region    effective_min_wage_2020_dollars_mean
    String    Float64
__________________________
1    S        8.15458
2    W        8.59119
3    NE       8.75413
4    MW       8.1514
Manipulasi Data di Julia

Menggunakan combine() dan groupby()

# Ubah nama kolom
combine(groupby(wages, :region), 
        :effective_min_wage_2020_dollars => mean => :average_min_wage_2020_dollars)
4x2 DataFrame
Row region    average_min_wage_2020_dollars
    String    Float64
__________________________
1    S        8.15458
2    W        8.59119
3    NE       8.75413
4    MW       8.1514
Manipulasi Data di Julia

Beberapa fungsi pada satu kolom

# Gunakan beberapa fungsi pada satu kolom
combine(groupby(wages, :region), 
                    :effective_min_wage_2020_dollars .=> [mean, median, maximum])
4x4 DataFrame
Row  region  effective_min_wage_2020_dollars_mean  ...
     String  Float64                               ...
____________________________________________________________________________________
1    S       8.15458                               ...
2    W       8.59119                               ...
3    NE      8.75413                               ...
4    MW      8.1514                                ...
Manipulasi Data di Julia

Beberapa fungsi pada satu kolom

# Gunakan beberapa fungsi pada satu kolom
combine(groupby(wages, :region), :effective_min_wage_2020_dollars .=> [mean, median] .=> [:average, :median])
4x2 DataFrame
Row region  average  median
    String  Float64  Float64
____________________________
1    S      8.15458  8.0
2    W      8.59119  8.34
...
# JANGAN lupa titik!
combine(groupby(wages, :region), :effective_min_wage_2020_dollars => [mean, median])
ArgumentError: Unrecognized column selector ...
Manipulasi Data di Julia

Beberapa kolom dengan satu fungsi

combine(groupby(wages, :region), [:state_min_wage, :federal_min_wage] .=> mean)
4x2 DataFrame
Row region  state_min_wage_mean  federal_min_wage_mean
    String  Float64              Float64
______________________________________________________
1   S       2.73128              4.35566
2    W       4.26638              4.35566
...
# JANGAN lupa titik
combine(groupby(wages, :region), [:state_min_wage, :federal_min_wage] => mean)
MethodError: objects of type ...
Manipulasi Data di Julia

Beberapa kolom dengan beberapa fungsi

# Fungsi sebagai matriks 1-baris
combine(groupby(wages, :region), [:state_min, :federal_min] .=> [mean median])
Row region  state_min_mean  federal_min_mean  state_min_median  federal_min_median
_________________________________________________________________________________
1   S       2.73128         4.35566           2.0                4.25
...
# Fungsi sebagai vektor
combine(groupby(wages, :region), [:state_min, :federal_min] .=> [mean, median])
Row region  state_min_mean  federal_min_median 
______________________________________________
1   S       2.73128         4.25
...
Manipulasi Data di Julia

Fungsi yang mungkin

Fungsi yang dapat digunakan:

  • Fungsi statistik umum seperti sum(), mean(), minimum(), ...
  • Fungsi buatan pengguna (dibroadcast)
  • Fungsi anonim, dibungkus dengan ByRow()
  • Fungsi khusus DataFrames: nrow, proprow, ...
Manipulasi Data di Julia

Ringkasan cepat

# Grouped DataFrame gdf
# 1 kolom + 1 fungsi
combine(gdf, :c => f => :new_c)

# 1 kolom + 2+ fungsi
combine(gdf, :c .=> [f1, f2, ...] .=> [:new_c_f1, :new_c_f2, ...])

# 2+ kolom + 1 fungsi
combine(gdf, [:c1, :c2, ...] .=> f .=> [:new_c1_f, :new_c2_f, ...])

# 2+ kolom + 2+ fungsi - semua kombinasi
combine(gdf, [:c1, :c2, ...] .=> [f1 f2 ...] .=> [:c1_f1, :c2_f1, ..., :c1_f2, ...])

# 2+ kolom + 2+ fungsi - berpasangan
combine(gdf, [:c1, :c2, ...] .=> [f1, f2, ...] .=> [:new_c1_f1, :new_c2_f2, ...])
Manipulasi Data di Julia

Ayo berlatih!

Manipulasi Data di Julia

Preparing Video For Download...