Filtering

Introduction to Julia

James Fulton

Climate informatics researcher

Example

6×4 DataFrame   
 Row | day     distance     time  raining
     | String     Int64  Float64     Bool
_____|__________________________________
   1 | Wednesday   2000    14.99     true
   2 | Monday      5000    31.68    false
   3 | Thursday    3500    22.02     true
   4 | Tuesday     3000    17.25     true
   5 | Thursday    4500    25.47    false
   6 | Monday      5000    30.77     true
Introduction to Julia

The filter function

6×4 DataFrame   
 Row | day     distance     time  raining
     | String     Int64  Float64     Bool
_____|__________________________________
   1 | Wednesday   2000    14.99     true
   2 | Monday      5000    31.68    false
...
# Filter to Monday runs
df_monday = filter(row -> row.day=="Monday", df_run)
Introduction to Julia

The filter function

println(df_monday)
6×4 DataFrame   
 Row | day     distance     time  raining
     | String     Int64  Float64     Bool
_____|__________________________________
   1 | Monday      5000    31.68    false
   2 | Monday      5000    30.77     true
Introduction to Julia

Filtering on numerical columns

# Filter to shorter runs
df_short = filter(row -> row.distance<=3000, df_run)

println(df_short)
 Row | day     distance     time  raining
_____|__________________________________
   1 | Wednesday   2000    14.99     true
   2 | Tuesday     3000    17.25     true
Introduction to Julia

Filtering on boolean columns

# Filter to raining days
df_raining = filter(row -> row.raining, df_run)
println(df_raining)
 Row | day     distance     time  raining
_____|__________________________________
   1 | Wednesday   2000    14.99     true
   2 | Thursday    3500    22.02     true
   3 | Tuesday     3000    17.25     true
   4 | Monday      5000    30.77     true
Introduction to Julia

Filtering on all comparisons

  • row.col == b filter to where row.col equals b
  • row.col != b filter to where row.col does not equal b
  • row.col > b filter to where row.col is greater than b
  • row.col >= b filter to where row.col is greater than or equal to b
  • row.col < b filter to where row.col is less than b
  • row.col <= b filter to where row.col is less than or equal to b
  • row.col filter to where row.col is true
Introduction to Julia

Further analysis

# Distance run in rain
println(sum(df_raining.time))
13500
Introduction to Julia

Let's practice!

Introduction to Julia

Preparing Video For Download...