Diving into DataFrames

Data Manipulation in Julia

Katerina Zahradova

Instructor

Course outline

  • Working with columns
  • Grouping of data
  • Summary statistics
  • Pivot tables
  • Loading and saving of CSV files
  • Visualizations
  • Writing readable and organized code
Data Manipulation in Julia

Datasets

Penguins

US wage

Chocolate

plane

1 Pexels
Data Manipulation in Julia

Strings and symbols

# Using strings
df[:, "col 1"]

df[:, "col2"]
# Using symbols
df[:, Symbol("col 1")]


df[:, :col2]
Data Manipulation in Julia

What is missing

# Using first()
println(first(penguins))
Row    species    island    culmen_l_mm ...
       String15   String15  String7?    ...
____________________________________________
1      Adelie     Torgersen 39.1
# Using describe
describe(penguins)
7x7 DataFrame
Row  variable     ...   nmissing  ...
     Symbol       ...   Int64     ... 
______________________________________
1    species      ...   0         ...       
2    island       ...   0         ...
3    culmen_l_mm  ...   10        ...
4    culmen_d_mm  ...   10        ...
5    flipper_l_mm ...   10        ...
...
Data Manipulation in Julia

Describe it better

# Describe
describe(penguins)
Row  variable     mean    min     ...
     Symbol       Nothing Union   ...
________________________________________________
1    species              Adelie        
2    island               Biscoe        
3    culmen_l_mm  32.1    34.7
4    culmen_d_mm  13.1    16    
5    flipper_l_mm 205.4   165
...
# Describe using only some columns
describe(penguins, :nmissing, :eltype)
Row  variable     nmissing    eltype
     Symbol       Int64       DataType
________________________________________________
1    species      0           String15        
2    island       0           String15        
3    culmen_l_mm  10          Float64
4    culmen_d_mm  10          Float64
5    flipper_l_mm 10          Float64
...
Data Manipulation in Julia

Describe it how we like it

# Using sum

describe(penguins, sum => :total)
7×2 DataFrame
Row  variable     total
     Symbol       Union
________________________________________
1    species    
2    island    
3    culmen_l_mm  15136.6
4    culmen_d_mm  5163.4    
...
Data Manipulation in Julia

DataFrames syntax

Columns transformation template

Data Manipulation in Julia

DataFrames syntax

Columns transformation template

Data Manipulation in Julia

Let's practice!

Data Manipulation in Julia

Preparing Video For Download...