Efficient workflow

Data Manipulation in Julia

Katerina Zahradova

Instructor

Tips for names

  • Short, meaningful names
    • wages rather than df
    • wages rather than us_min_wages_data_between_1968_and_2020_with_inflation_adjusted_column
  • Follow naming conventions/patterns
    • mixing state_wage_2020 and effective.2020.dollars can be hard to remember
    • same with capitals, avoid state, Year, and REGION in the same DataFrame
Data Manipulation in Julia

Too many variables

  • Don't create too many new variables

    • clutters memory
    • chaos: what is the difference between wages_no_missing, wages_missing_state_only, wages_original_no_missing, wages_state_mean_no_missing, etc.
  • Overwrite! Use select!(), transform!(), etc.

  • Use chain macros to reduce the need for new versions of the same data
Data Manipulation in Julia

Variables instead of hard coding

  • Variables over hard coding values
# Rather
replace_missing = 0

replace!(df.col1, missing => replace_missing)
replace!(df.col2, missing => replace_missing)

# Than
replace!(df.col1, missing => 0)
replace!(df.col2, missing => 0)
Data Manipulation in Julia

Make a function of it

  • Write a function rather than write code over and over and over again!
    • functions prevents typos
    • once set up, they are quicker to use
# Function to plot multiple lineplots with labels
function make_line_plot(xs, ys,labels; xlabel="", ylabel="", title="")
    p = plot(title = title, xlabel = xlabel, ylabel = ylabel)
    for (x, y, label) in zip(xs, ys, labels)
        plot!(x, y, label=label)
    end
    p
end
Data Manipulation in Julia

Comment and document

  • Comments for what we are doing
# Standardize names
rename!(df, :ColumnOne => :col_1)

# Lines with missing company
df[ismissing.(df.company),:]

# Pivoting on year and state
unstack(wages, :year, :state, :eff_min_wage)
  • Document why we are doing things
# Replace missing wages by minimum
# As the worst case
min = minimum(skipmissing(df.wages))
replace!(df.wages, missing => min)

# Joining with countries
# To study how countries influence quality
leftjoin(company, countries, on=:location)
Data Manipulation in Julia

Get to know the data

  • Take the time to understand the data
    • Easier to extract information later
    • Make plots, print the results, ...

Get to know your data

1 Photo by Myriam Jessier on Unsplash
Data Manipulation in Julia

Ask for help!

Google, Stack Overflow, DataCamp logos

Data Manipulation in Julia

Have fun!

  • Have fun, don't give up, and enjoy!
Data Manipulation in Julia

Flight delays in US airports

Structure of flight data

Data Manipulation in Julia

Let's practice!

Data Manipulation in Julia

Preparing Video For Download...