Manipulating columns

Data Manipulation in Julia

Katerina Zahradova

Instructor

Applying functions

  • Functions taking whole columns

    • Features determined by whole column, e.g., mean, minimum, etc.
    • In Julia, functions such as maximum are written in full as maximum(), not just max()
  • Functions working on individual lines

Data Manipulation in Julia

Options

  • select()

  • transform()

  • combine()

To mutate a DataFrame in place:

  • select!(), transform!(), combine!()
Data Manipulation in Julia

select()

# Selecting columns
select(penguins, :species, :body_mass_g)
333x2 DataFrame
Row species   body_mass_g
    String15  Int64
___________________________
1   Adelie    3750
2   Adelie    3800
3   Adelie    3250
...
# Selecting and renaming columns
select(penguins, :species, :body_mass_g => :weight_g)
333x2 DataFrame
Row species   weight_g
    String15  Int64
___________________________
1   Adelie    3750
2   Adelie    3800
3   Adelie    3250
...
Data Manipulation in Julia

select()

# Select columns and apply functions
select(penguins, :species, :body_mass_g => mean)
333x2 DataFrame
Row species   body_mass_g_mm
    String15  Float64
___________________________
1   Adelie    4207.06
2   Adelie    4207.06
3   Adelie    4207.06
...
Data Manipulation in Julia

transform()

# Adding column with maximum of body_mass_g
transform(penguins, :body_mass_g => maximum)
333x8 DataFrame
Row species   island    ...  body_mass_g  sex      body_mass_g_maximum
    String15  String15  ...  Int64        String7  Float64 
___________________________________________________________________
1   Adelie    Torgersen ...  3750         MALE     4207.06
2   Adelie    Torgersen ...  3800         FEMALE   4207.06
...
Data Manipulation in Julia

combine()

# Combining penguins with maximum of body_mass_g
combine(penguins, :body_mass_g => maximum)
1×1 DataFrame
Row  body_mass_g_mean
     Float64
__________________________
1    4207.06

Data Manipulation in Julia

How to handle multiples

# Using multiple functions on a column
combine(penguins, :body_mass_g .=> [mean, minimum, maximum])
Row  body_mass_g_mean  body_mass_g_minimum  body_mass_g_maximum
     Float64           Float64              Float64       
_______________________________________________________________
1    4207.06           2700                 6300
# Passing multiple columns to a function
select(penguins, [:body_mass_g, :flipper_length_mm] .=> mean)
Row  body_mass_g_mean  flipper_length_mm_mean  
     Float64           Float64              
___________________________________________
1    4207.06           200.967               
2    4207.06           200.967   
...
Data Manipulation in Julia

Cheat sheet

  • select():

    • Only includes specified columns
    • Same number of rows; same value is broadcasted over all rows
  • transform():

    • Keeps all columns and adds new ones
    • Same number of rows same value is broadcasted over all rows
  • combine():

    • Only includes specified columns
    • Does not broadcast the values over all rows
Data Manipulation in Julia

Let's practice!

Data Manipulation in Julia

Preparing Video For Download...