Pivoting data

Data Manipulation in Julia

Katerina Zahradova

Instructor

Pivot tables

  • Way of summarizing data
  • Reorganization of columns and rows to make data more readable
Data Manipulation in Julia

Pivot table using unstack()

unstack structure

Data Manipulation in Julia

unstack() to pivot

# Using unstack to pivot
# unstack(DataFrame, :col_as_rows, :col_as_cols, :values, combine = f_to_aggregate)

unstack(penguins, :species, :sex, :body_mass_g, combine = median)
3×3 DataFrame
Row  species   MALE     FEMALE
     String15  Float64  Float64
_____________________________________
1    Adelie    4000.0   3400.0
2    Chinstrap 3950.0   3550.0
3    Gentoo    5500.0   4700.0
Data Manipulation in Julia

Pivoting on multiple columns as rows

# Pivot on more columns as rows 
unstack(penguins, [:species, :island], :sex, :body_mass_g, combine = median)
5×4 DataFrame
Row  species    island     MALE       FEMALE
     String15   String15   Float64?   Float64?
_____________________________________________________
1    Adelie     Torgersen  4000.0    3400.0
2    Adelie     Biscoe     4000.0    3375.0
3    Adelie     Dream      3987.5    3400.0
4    Chinstrap  Dream      3950.0    3550.0
5    Gentoo     Biscoe     5500.0    4700.0
Data Manipulation in Julia

Pivoting on multiple columns elsewhere

# Using multiple columns as columns
unstack(penguins, :sex, [:species, :island], :body_mass_g, combine = sum)
MethodError: no method matching 
unstack(::DataFrame, ::Symbol, ::Vector{Symbol}, ::Symbol; combine = sum)
Data Manipulation in Julia

Missing values

# Missing values for certain combinations
unstack(penguins_missing, :species, :sex, :body_mass_g, combine = median)
3×3 DataFrame
Row  species   MALE     FEMALE
     String15  Float64  Float64?
_____________________________________
1    Adelie    4000.0   3400.0
2    Chinstrap 3950.0   3550.0
3    Gentoo    5500.0   missing
Data Manipulation in Julia

Replacing missing values

# Missing values for certain combinations
unstack(penguins_missing, :species, :sex, :body_mass_g, combine = median, fill = -1)
3×3 DataFrame
Row  species   MALE     FEMALE
     String15  Float64  Float64
_____________________________________
1    Adelie    4000.0    3400.0
2    Chinstrap 3950.0    3550.0
3    Gentoo    5500.0    -1
Data Manipulation in Julia

Pivot tables are DataFrames

# Saving the pivot table
pivot_penguins = unstack(penguins_missing, :species, :sex, :body_mass_g, combine = median)

# Select only female penguins
select(pivot_penguins, :species, :FEMALE)
3×2 DataFrame
Row  species    FEMALE
     String15   Float64?
_______________________________
1    Adelie      3400.0
...
Data Manipulation in Julia

Let's practice!

Data Manipulation in Julia

Preparing Video For Download...