Data Manipulation in Julia
Katerina Zahradova
Instructor
# Print wages
wages
2703×10 DataFrame 2678 rows omitted 5 columns omitted
Row year state region state_min_wage state_min_wage_2020_dollars ...
Int64 String31 String3 Float64 Float64 ...
______________________________________________________________________________
1 1968 Alabama S 0.0 0.0 ...
2 1968 Alaska W 2.1 15.61 ...
...
# Using position of the column
wages[:, 1]
# Using name of the column
wages[:, "year"]
wages[:, :year]
# Using name of the column
wages.year
# Selecting several columns
wages[:, ["year", "state"]]
wages[:, [:year, :state]]
wages[:, [1,2]]
# Selecting the state, year, and state minimum wage
select(wages, 2, 1, 4)
2703×3 DataFrame 2678 rows omitted
Row state year state_min_wage
String31 Int64 Float64
____________________________________________
1 Alabama 1968 0.0
2 Alaska 1968 2.1
...
# Selecting the state, year, and state minimum wage
select(wages, "state", :year, 4)
2703×3 DataFrame 2678 rows
Row state year state_min_wage
String31 Int64 Float64
____________________________________________
1 Alabama 1968 0.0
2 Alaska 1968 2.1
...
Selecting columns
... gets bothersome in large datasets
# All columns starting with state
select(wages, Cols(startswith("state")))
2703×3 DataFrame 2702 rows omitted
Row state state_min_wage state_min_wage_2020_dollars
String31 Float64 Float64
__________________________________________
1 Alabama 0.0 0.0
...
# All columns ending with 2020_dollars
select(wages, Cols(endswith("2020_dollars")))
2703x3 DataFrame 2702 rows omitted
Row state_min_wage_2020_dollars federal_min_wage_2020_dollars effective_min_wage_2020_dollars
Float64 Float64 Float64
________________________________________________________________________________________________
1 0.0 8.55 8.55
...
# All columns containing min
select(wages, Cols(contains.("min")))
2703×6 DataFrame 2702 rows omitted 3 columns omitted
Row state_min_wage state_min_wage_2020_dollars federal_min_wage ...
Float64 Float64 Float64 ...
_______________________________________________________________________
1 0.0 0.0 1.15 ...
...
regex = regular expressions
# Selecting using regex
select(wages, r"min")
2703×6 DataFrame 2702 rows omitted 3 columns omitted
Row state_min_wage state_min_wage_2020_dollars federal_min_wage ...
Float64 Float64 Float64 ...
_______________________________________________________________________
1 0.0 0.0 1.15 ...
...
# Mutates original DataFrame
select!(wages, :year, :state)
# Original DataFrame changed
println(first(wages))
DataFrameRow (2 columns)
Row year state
Int64 String31
______________________
1 1968 Alabama
# Returns new DataFrame
select(wages, :year, :state)
# Original DataFrame is intact
println(first(wages))
DataFrameRow (10 columns, 7 omitted)
Row year state region ...
Int64 String31 String3 ...
____________________________________
1 1968 Alabama S ...
Data Manipulation in Julia