Data Manipulation in Julia
Katerina Zahradova
Instructor
wages
rather than df
wages
rather than us_min_wages_data_between_1968_and_2020_with_inflation_adjusted_column
state_wage_2020
and effective.2020.dollars
can be hard to rememberstate
, Year
, and REGION
in the same DataFrameDon't create too many new variables
wages_no_missing
, wages_missing_state_only
, wages_original_no_missing
, wages_state_mean_no_missing
, etc.Overwrite! Use select!()
, transform!()
, etc.
chain
macros to reduce the need for new versions of the same data# Rather
replace_missing = 0
replace!(df.col1, missing => replace_missing)
replace!(df.col2, missing => replace_missing)
# Than
replace!(df.col1, missing => 0)
replace!(df.col2, missing => 0)
# Function to plot multiple lineplots with labels
function make_line_plot(xs, ys,labels; xlabel="", ylabel="", title="")
p = plot(title = title, xlabel = xlabel, ylabel = ylabel)
for (x, y, label) in zip(xs, ys, labels)
plot!(x, y, label=label)
end
p
end
# Standardize names
rename!(df, :ColumnOne => :col_1)
# Lines with missing company
df[ismissing.(df.company),:]
# Pivoting on year and state
unstack(wages, :year, :state, :eff_min_wage)
# Replace missing wages by minimum
# As the worst case
min = minimum(skipmissing(df.wages))
replace!(df.wages, missing => min)
# Joining with countries
# To study how countries influence quality
leftjoin(company, countries, on=:location)
Data Manipulation in Julia