Selecting columns

Data Manipulation in Julia

Katerina Zahradova

Instructor

US minimum wages

# Print wages
wages
2703×10 DataFrame 2678 rows omitted 5 columns omitted
Row  year   state     region   state_min_wage    state_min_wage_2020_dollars ...
     Int64  String31  String3  Float64           Float64           ...
______________________________________________________________________________
1    1968   Alabama   S        0.0               0.0               ...
2    1968   Alaska    W        2.1               15.61             ...
...
Data Manipulation in Julia

How we slice

# Using position of the column
wages[:, 1]
# Using name of the column
wages[:, "year"]
wages[:, :year]
# Using name of the column
wages.year
# Selecting several columns
wages[:, ["year", "state"]]
wages[:, [:year, :state]]
wages[:, [1,2]]
Data Manipulation in Julia

Selecting columns

# Selecting the state, year, and state minimum wage

select(wages, 2, 1, 4)
2703×3 DataFrame 2678 rows omitted
Row  state     year        state_min_wage    
     String31  Int64       Float64                      
____________________________________________
1    Alabama   1968        0.0                              
2    Alaska    1968        2.1                            
...
Data Manipulation in Julia

Selecting columns

# Selecting the state, year, and state minimum wage
select(wages, "state", :year, 4)
2703×3 DataFrame 2678 rows
Row  state     year        state_min_wage    
     String31  Int64       Float64                      
____________________________________________
1    Alabama   1968        0.0                              
2    Alaska    1968        2.1                            
...
Data Manipulation in Julia

Selecting using patterns

Selecting columns

  • Starting/ending with a letter/word
  • Containing a sub-string
  • ...

... gets bothersome in large datasets

Data Manipulation in Julia

Selecting using patterns

# All columns starting with state
select(wages, Cols(startswith("state")))
2703×3 DataFrame 2702 rows omitted
Row state    state_min_wage state_min_wage_2020_dollars
    String31 Float64        Float64
__________________________________________
1   Alabama  0.0            0.0
...
Data Manipulation in Julia

Selecting using patterns

# All columns ending with 2020_dollars
select(wages, Cols(endswith("2020_dollars")))
2703x3 DataFrame 2702 rows omitted
Row state_min_wage_2020_dollars  federal_min_wage_2020_dollars  effective_min_wage_2020_dollars
    Float64                      Float64                        Float64
________________________________________________________________________________________________
1   0.0                          8.55                           8.55
...
Data Manipulation in Julia

Selecting using patterns

# All columns containing min
select(wages, Cols(contains.("min")))
2703×6 DataFrame 2702 rows omitted 3 columns omitted
Row  state_min_wage  state_min_wage_2020_dollars  federal_min_wage  ...    
     Float64         Float64                      Float64           ...
_______________________________________________________________________
1    0.0             0.0                          1.15              ...
... 
Data Manipulation in Julia

Regex

regex = regular expressions

Data Manipulation in Julia

Using regex

# Selecting using regex
select(wages, r"min")
2703×6 DataFrame 2702 rows omitted 3 columns omitted
Row  state_min_wage  state_min_wage_2020_dollars  federal_min_wage  ...    
     Float64         Float64                      Float64           ...
_______________________________________________________________________
1    0.0             0.0                          1.15              ...
...
Data Manipulation in Julia

select!() vs. select()

# Mutates original DataFrame
select!(wages, :year, :state)

# Original DataFrame changed
println(first(wages))
DataFrameRow (2 columns)
Row  year    state     
     Int64   String31  
______________________
1    1968    Alabama
# Returns new DataFrame
select(wages, :year, :state)

# Original DataFrame is intact
println(first(wages))
DataFrameRow (10 columns, 7 omitted)
Row  year    state     region    ...
     Int64   String31  String3   ...
____________________________________
1    1968    Alabama   S         ...
Data Manipulation in Julia

Let's practice!

Data Manipulation in Julia

Preparing Video For Download...