Gegroepeerde samenvattingsstatistieken

Data manipulatie in Julia

Katerina Zahradova

Instructor

Wat we nu weten

# Gemiddeld minimumloon berekenen
combine(wages, :effective_min_wage_2020_dollars => mean)
1x1 DataFrame
Row | effective_min_wage_2020_dollars_mean
    | Float64
____|_______________________
   1| 8.37093
Data manipulatie in Julia

Wat we nu weten

# Filteren en gemiddelde berekenen
first(combine(filter(r -> r.region =="W", wages), :effective_min_wage_2020_dollars => mean))
first(combine(filter(r -> r.region =="S", wages), :effective_min_wage_2020_dollars => mean))
first(combine(filter(r -> r.region =="NE", wages), :effective_min_wage_2020_dollars => mean))
DataFrameRow (1 columns)
Row | effective_min_wage_2020_dollars_mean
    | Float64
____|_______________________
   1| 8.75413
...
Data manipulatie in Julia

combine() en groupby() gebruiken

# Groeperen op regio
wages_by_region = groupby(wages, :region)

# Gemiddelde per groep berekenen combine(wages_by_region, :effective_min_wage_2020_dollars => mean)
4x2 DataFrame
Row region    effective_min_wage_2020_dollars_mean
    String    Float64
__________________________
1    S        8.15458
2    W        8.59119
3    NE       8.75413
4    MW       8.1514
Data manipulatie in Julia

combine() en groupby() combineren

# Combineer ze
combine(groupby(wages, :region), :effective_min_wage_2020_dollars => mean)
4x2 DataFrame
Row region    effective_min_wage_2020_dollars_mean
    String    Float64
__________________________
1    S        8.15458
2    W        8.59119
3    NE       8.75413
4    MW       8.1514
Data manipulatie in Julia

combine() en groupby() gebruiken

# Kolom hernoemen
combine(groupby(wages, :region), 
        :effective_min_wage_2020_dollars => mean => :average_min_wage_2020_dollars)
4x2 DataFrame
Row region    average_min_wage_2020_dollars
    String    Float64
__________________________
1    S        8.15458
2    W        8.59119
3    NE       8.75413
4    MW       8.1514
Data manipulatie in Julia

Meerdere functies op één kolom

# Meerdere functies op één kolom
combine(groupby(wages, :region), 
                    :effective_min_wage_2020_dollars .=> [mean, median, maximum])
4x4 DataFrame
Row  region  effective_min_wage_2020_dollars_mean  ...
     String  Float64                               ...
____________________________________________________________________________________
1    S       8.15458                               ...
2    W       8.59119                               ...
3    NE      8.75413                               ...
4    MW      8.1514                                ...
Data manipulatie in Julia

Meerdere functies op één kolom

# Meerdere functies op één kolom
combine(groupby(wages, :region), :effective_min_wage_2020_dollars .=> [mean, median] .=> [:average, :median])
4x2 DataFrame
Row region  average  median
    String  Float64  Float64
____________________________
1    S      8.15458  8.0
2    W      8.59119  8.34
...
# Vergeet de punt NIET!
combine(groupby(wages, :region), :effective_min_wage_2020_dollars => [mean, median])
ArgumentError: Unrecognized column selector ...
Data manipulatie in Julia

Meerdere kolommen met één functie

combine(groupby(wages, :region), [:state_min_wage, :federal_min_wage] .=> mean)
4x2 DataFrame
Row region  state_min_wage_mean  federal_min_wage_mean
    String  Float64              Float64
______________________________________________________
1   S       2.73128              4.35566
2    W       4.26638              4.35566
...
# Vergeet de punt NIET
combine(groupby(wages, :region), [:state_min_wage, :federal_min_wage] => mean)
MethodError: objects of type ...
Data manipulatie in Julia

Meerdere kolommen met meerdere functies

# Functies als 1-rij-matrix
combine(groupby(wages, :region), [:state_min, :federal_min] .=> [mean median])
Row region  state_min_mean  federal_min_mean  state_min_median  federal_min_median
_________________________________________________________________________________
1   S       2.73128         4.35566           2.0                4.25
...
# Functies als vector
combine(groupby(wages, :region), [:state_min, :federal_min] .=> [mean, median])
Row region  state_min_mean  federal_min_median 
______________________________________________
1   S       2.73128         4.25
...
Data manipulatie in Julia

Mogelijke functies

Functies die je kunt gebruiken:

  • Gebruikelijke statistiekfuncties zoals sum(), mean(), minimum(), ...
  • Zelf gedefinieerde functies (gebroadcast)
  • Anonieme functies, verpakt in ByRow()
  • Speciale DataFrames-functies: nrow, proprow, ...
Data manipulatie in Julia

Spiekbriefje

# Gegroepeerde DataFrame gdf
# 1 kolom + 1 functie
combine(gdf, :c => f => :new_c)

# 1 kolom + 2+ functies
combine(gdf, :c .=> [f1, f2, ...] .=> [:new_c_f1, :new_c_f2, ...])

# 2+ kolommen + 1 functie
combine(gdf, [:c1, :c2, ...] .=> f .=> [:new_c1_f, :new_c2_f, ...])

# 2+ kolommen + 2+ functies - alle combinaties
combine(gdf, [:c1, :c2, ...] .=> [f1 f2 ...] .=> [:c1_f1, :c2_f1, ..., :c1_f2, ...])

# 2+ kolommen + 2+ functies - paarsgewijs
combine(gdf, [:c1, :c2, ...] .=> [f1, f2, ...] .=> [:new_c1_f1, :new_c2_f2, ...])
Data manipulatie in Julia

Laten we oefenen!

Data manipulatie in Julia

Preparing Video For Download...