Manipulasi Data di Julia
Katerina Zahradova
Instructor
# Hitung rata-rata upah minimum
combine(wages, :effective_min_wage_2020_dollars => mean)
1x1 DataFrame
Row | effective_min_wage_2020_dollars_mean
| Float64
____|_______________________
1| 8.37093
# Saring dan hitung rata-rata
first(combine(filter(r -> r.region =="W", wages), :effective_min_wage_2020_dollars => mean))
first(combine(filter(r -> r.region =="S", wages), :effective_min_wage_2020_dollars => mean))
first(combine(filter(r -> r.region =="NE", wages), :effective_min_wage_2020_dollars => mean))
DataFrameRow (1 columns)
Row | effective_min_wage_2020_dollars_mean
| Float64
____|_______________________
1| 8.75413
...
# Kelompokkan berdasarkan region wages_by_region = groupby(wages, :region)# Hitung rata-rata per grup combine(wages_by_region, :effective_min_wage_2020_dollars => mean)
4x2 DataFrame
Row region effective_min_wage_2020_dollars_mean
String Float64
__________________________
1 S 8.15458
2 W 8.59119
3 NE 8.75413
4 MW 8.1514
# Gabungkan keduanya
combine(groupby(wages, :region), :effective_min_wage_2020_dollars => mean)
4x2 DataFrame
Row region effective_min_wage_2020_dollars_mean
String Float64
__________________________
1 S 8.15458
2 W 8.59119
3 NE 8.75413
4 MW 8.1514
# Ubah nama kolom
combine(groupby(wages, :region),
:effective_min_wage_2020_dollars => mean => :average_min_wage_2020_dollars)
4x2 DataFrame
Row region average_min_wage_2020_dollars
String Float64
__________________________
1 S 8.15458
2 W 8.59119
3 NE 8.75413
4 MW 8.1514
# Gunakan beberapa fungsi pada satu kolom
combine(groupby(wages, :region),
:effective_min_wage_2020_dollars .=> [mean, median, maximum])
4x4 DataFrame
Row region effective_min_wage_2020_dollars_mean ...
String Float64 ...
____________________________________________________________________________________
1 S 8.15458 ...
2 W 8.59119 ...
3 NE 8.75413 ...
4 MW 8.1514 ...
# Gunakan beberapa fungsi pada satu kolom
combine(groupby(wages, :region), :effective_min_wage_2020_dollars .=> [mean, median] .=> [:average, :median])
4x2 DataFrame
Row region average median
String Float64 Float64
____________________________
1 S 8.15458 8.0
2 W 8.59119 8.34
...
# JANGAN lupa titik!
combine(groupby(wages, :region), :effective_min_wage_2020_dollars => [mean, median])
ArgumentError: Unrecognized column selector ...
combine(groupby(wages, :region), [:state_min_wage, :federal_min_wage] .=> mean)
4x2 DataFrame
Row region state_min_wage_mean federal_min_wage_mean
String Float64 Float64
______________________________________________________
1 S 2.73128 4.35566
2 W 4.26638 4.35566
...
# JANGAN lupa titik
combine(groupby(wages, :region), [:state_min_wage, :federal_min_wage] => mean)
MethodError: objects of type ...
# Fungsi sebagai matriks 1-baris
combine(groupby(wages, :region), [:state_min, :federal_min] .=> [mean median])
Row region state_min_mean federal_min_mean state_min_median federal_min_median
_________________________________________________________________________________
1 S 2.73128 4.35566 2.0 4.25
...
# Fungsi sebagai vektor
combine(groupby(wages, :region), [:state_min, :federal_min] .=> [mean, median])
Row region state_min_mean federal_min_median
______________________________________________
1 S 2.73128 4.25
...
Fungsi yang dapat digunakan:
sum(), mean(), minimum(), ...ByRow()nrow, proprow, ...# Grouped DataFrame gdf
# 1 kolom + 1 fungsi
combine(gdf, :c => f => :new_c)
# 1 kolom + 2+ fungsi
combine(gdf, :c .=> [f1, f2, ...] .=> [:new_c_f1, :new_c_f2, ...])
# 2+ kolom + 1 fungsi
combine(gdf, [:c1, :c2, ...] .=> f .=> [:new_c1_f, :new_c2_f, ...])
# 2+ kolom + 2+ fungsi - semua kombinasi
combine(gdf, [:c1, :c2, ...] .=> [f1 f2 ...] .=> [:c1_f1, :c2_f1, ..., :c1_f2, ...])
# 2+ kolom + 2+ fungsi - berpasangan
combine(gdf, [:c1, :c2, ...] .=> [f1, f2, ...] .=> [:new_c1_f1, :new_c2_f2, ...])
Manipulasi Data di Julia