Kolommen selecteren

Data manipulatie in Julia

Katerina Zahradova

Instructor

Minimumloon VS

# Print wages
wages
2703×10 DataFrame 2678 rows omitted 5 columns omitted
Row  year   state     region   state_min_wage    state_min_wage_2020_dollars ...
     Int64  String31  String3  Float64           Float64           ...
______________________________________________________________________________
1    1968   Alabama   S        0.0               0.0               ...
2    1968   Alaska    W        2.1               15.61             ...
...
Data manipulatie in Julia

Hoe we slicen

# Selecteren via kolompositie
wages[:, 1]
# Selecteren via kolomnaam
wages[:, "year"]
wages[:, :year]
# Selecteren via kolomnaam
wages.year
# Meerdere kolommen selecteren
wages[:, ["year", "state"]]
wages[:, [:year, :state]]
wages[:, [1,2]]
Data manipulatie in Julia

Kolommen selecteren

# De staat, het jaar en het minimumloon van de staat selecteren

select(wages, 2, 1, 4)
2703×3 DataFrame 2678 rows omitted
Row  state     year        state_min_wage    
     String31  Int64       Float64                      
____________________________________________
1    Alabama   1968        0.0                              
2    Alaska    1968        2.1                            
...
Data manipulatie in Julia

Kolommen selecteren

# De staat, het jaar en het minimumloon van de staat selecteren
select(wages, "state", :year, 4)
2703×3 DataFrame 2678 rows
Row  state     year        state_min_wage    
     String31  Int64       Float64                      
____________________________________________
1    Alabama   1968        0.0                              
2    Alaska    1968        2.1                            
...
Data manipulatie in Julia

Selecteren met patronen

Kolommen selecteren

  • Begint/eindigt met een letter/woord
  • Bevat een substring
  • ...

... wordt lastig bij grote datasets

Data manipulatie in Julia

Selecteren met patronen

# Alle kolommen die beginnen met state
select(wages, Cols(startswith("state")))
2703×3 DataFrame 2702 rows omitted
Row state    state_min_wage state_min_wage_2020_dollars
    String31 Float64        Float64
__________________________________________
1   Alabama  0.0            0.0
...
Data manipulatie in Julia

Selecteren met patronen

# Alle kolommen die eindigen op 2020_dollars
select(wages, Cols(endswith("2020_dollars")))
2703x3 DataFrame 2702 rows omitted
Row state_min_wage_2020_dollars  federal_min_wage_2020_dollars  effective_min_wage_2020_dollars
    Float64                      Float64                        Float64
________________________________________________________________________________________________
1   0.0                          8.55                           8.55
...
Data manipulatie in Julia

Selecteren met patronen

# Alle kolommen die min bevatten
select(wages, Cols(contains.("min")))
2703×6 DataFrame 2702 rows omitted 3 columns omitted
Row  state_min_wage  state_min_wage_2020_dollars  federal_min_wage  ...    
     Float64         Float64                      Float64           ...
_______________________________________________________________________
1    0.0             0.0                          1.15              ...
... 
Data manipulatie in Julia

Regex

regex = regular expressions

Data manipulatie in Julia

Regex gebruiken

# Selecteren met regex
select(wages, r"min")
2703×6 DataFrame 2702 rows omitted 3 columns omitted
Row  state_min_wage  state_min_wage_2020_dollars  federal_min_wage  ...    
     Float64         Float64                      Float64           ...
_______________________________________________________________________
1    0.0             0.0                          1.15              ...
...
Data manipulatie in Julia

select!() vs. select()

# Wijzigt de oorspronkelijke DataFrame
select!(wages, :year, :state)

# Oorspronkelijke DataFrame is gewijzigd
println(first(wages))
DataFrameRow (2 columns)
Row  year    state     
     Int64   String31  
______________________
1    1968    Alabama
# Geeft een nieuwe DataFrame terug
select(wages, :year, :state)

# Oorspronkelijke DataFrame blijft intact
println(first(wages))
DataFrameRow (10 columns, 7 omitted)
Row  year    state     region    ...
     Int64   String31  String3   ...
____________________________________
1    1968    Alabama   S         ...
Data manipulatie in Julia

Laten we oefenen!

Data manipulatie in Julia

Preparing Video For Download...