Data Manipulation in Julia
Katerina Zahradova
Instructor
Functions taking whole columns
maximum()
, not just max()
Functions working on individual lines
select()
transform()
combine()
To mutate a DataFrame in place:
select!()
, transform!()
, combine!()
# Selecting columns
select(penguins, :species, :body_mass_g)
333x2 DataFrame
Row species body_mass_g
String15 Int64
___________________________
1 Adelie 3750
2 Adelie 3800
3 Adelie 3250
...
# Selecting and renaming columns
select(penguins, :species, :body_mass_g => :weight_g)
333x2 DataFrame
Row species weight_g
String15 Int64
___________________________
1 Adelie 3750
2 Adelie 3800
3 Adelie 3250
...
# Select columns and apply functions
select(penguins, :species, :body_mass_g => mean)
333x2 DataFrame
Row species body_mass_g_mm
String15 Float64
___________________________
1 Adelie 4207.06
2 Adelie 4207.06
3 Adelie 4207.06
...
# Adding column with maximum of body_mass_g
transform(penguins, :body_mass_g => maximum)
333x8 DataFrame
Row species island ... body_mass_g sex body_mass_g_maximum
String15 String15 ... Int64 String7 Float64
___________________________________________________________________
1 Adelie Torgersen ... 3750 MALE 4207.06
2 Adelie Torgersen ... 3800 FEMALE 4207.06
...
# Combining penguins with maximum of body_mass_g
combine(penguins, :body_mass_g => maximum)
1×1 DataFrame
Row body_mass_g_mean
Float64
__________________________
1 4207.06
# Using multiple functions on a column
combine(penguins, :body_mass_g .=> [mean, minimum, maximum])
Row body_mass_g_mean body_mass_g_minimum body_mass_g_maximum
Float64 Float64 Float64
_______________________________________________________________
1 4207.06 2700 6300
# Passing multiple columns to a function
select(penguins, [:body_mass_g, :flipper_length_mm] .=> mean)
Row body_mass_g_mean flipper_length_mm_mean
Float64 Float64
___________________________________________
1 4207.06 200.967
2 4207.06 200.967
...
select()
:
transform()
:
combine()
:
Data Manipulation in Julia