Data manipulatie in Julia
Katerina Zahradova
Instructor
# Gemiddeld minimumloon berekenen
combine(wages, :effective_min_wage_2020_dollars => mean)
1x1 DataFrame
Row | effective_min_wage_2020_dollars_mean
| Float64
____|_______________________
1| 8.37093
# Filteren en gemiddelde berekenen
first(combine(filter(r -> r.region =="W", wages), :effective_min_wage_2020_dollars => mean))
first(combine(filter(r -> r.region =="S", wages), :effective_min_wage_2020_dollars => mean))
first(combine(filter(r -> r.region =="NE", wages), :effective_min_wage_2020_dollars => mean))
DataFrameRow (1 columns)
Row | effective_min_wage_2020_dollars_mean
| Float64
____|_______________________
1| 8.75413
...
# Groeperen op regio wages_by_region = groupby(wages, :region)# Gemiddelde per groep berekenen combine(wages_by_region, :effective_min_wage_2020_dollars => mean)
4x2 DataFrame
Row region effective_min_wage_2020_dollars_mean
String Float64
__________________________
1 S 8.15458
2 W 8.59119
3 NE 8.75413
4 MW 8.1514
# Combineer ze
combine(groupby(wages, :region), :effective_min_wage_2020_dollars => mean)
4x2 DataFrame
Row region effective_min_wage_2020_dollars_mean
String Float64
__________________________
1 S 8.15458
2 W 8.59119
3 NE 8.75413
4 MW 8.1514
# Kolom hernoemen
combine(groupby(wages, :region),
:effective_min_wage_2020_dollars => mean => :average_min_wage_2020_dollars)
4x2 DataFrame
Row region average_min_wage_2020_dollars
String Float64
__________________________
1 S 8.15458
2 W 8.59119
3 NE 8.75413
4 MW 8.1514
# Meerdere functies op één kolom
combine(groupby(wages, :region),
:effective_min_wage_2020_dollars .=> [mean, median, maximum])
4x4 DataFrame
Row region effective_min_wage_2020_dollars_mean ...
String Float64 ...
____________________________________________________________________________________
1 S 8.15458 ...
2 W 8.59119 ...
3 NE 8.75413 ...
4 MW 8.1514 ...
# Meerdere functies op één kolom
combine(groupby(wages, :region), :effective_min_wage_2020_dollars .=> [mean, median] .=> [:average, :median])
4x2 DataFrame
Row region average median
String Float64 Float64
____________________________
1 S 8.15458 8.0
2 W 8.59119 8.34
...
# Vergeet de punt NIET!
combine(groupby(wages, :region), :effective_min_wage_2020_dollars => [mean, median])
ArgumentError: Unrecognized column selector ...
combine(groupby(wages, :region), [:state_min_wage, :federal_min_wage] .=> mean)
4x2 DataFrame
Row region state_min_wage_mean federal_min_wage_mean
String Float64 Float64
______________________________________________________
1 S 2.73128 4.35566
2 W 4.26638 4.35566
...
# Vergeet de punt NIET
combine(groupby(wages, :region), [:state_min_wage, :federal_min_wage] => mean)
MethodError: objects of type ...
# Functies als 1-rij-matrix
combine(groupby(wages, :region), [:state_min, :federal_min] .=> [mean median])
Row region state_min_mean federal_min_mean state_min_median federal_min_median
_________________________________________________________________________________
1 S 2.73128 4.35566 2.0 4.25
...
# Functies als vector
combine(groupby(wages, :region), [:state_min, :federal_min] .=> [mean, median])
Row region state_min_mean federal_min_median
______________________________________________
1 S 2.73128 4.25
...
Functies die je kunt gebruiken:
sum(), mean(), minimum(), ...ByRow()nrow, proprow, ...# Gegroepeerde DataFrame gdf
# 1 kolom + 1 functie
combine(gdf, :c => f => :new_c)
# 1 kolom + 2+ functies
combine(gdf, :c .=> [f1, f2, ...] .=> [:new_c_f1, :new_c_f2, ...])
# 2+ kolommen + 1 functie
combine(gdf, [:c1, :c2, ...] .=> f .=> [:new_c1_f, :new_c2_f, ...])
# 2+ kolommen + 2+ functies - alle combinaties
combine(gdf, [:c1, :c2, ...] .=> [f1 f2 ...] .=> [:c1_f1, :c2_f1, ..., :c1_f2, ...])
# 2+ kolommen + 2+ functies - paarsgewijs
combine(gdf, [:c1, :c2, ...] .=> [f1, f2, ...] .=> [:new_c1_f1, :new_c2_f2, ...])
Data manipulatie in Julia