Sorting and slicing data

Introduction to Julia

James Fulton

Climate informatics researcher

Selecting an element from the DataFrame

df_run = DataFrame(CSV.File("run.csv"))
println(df_run)
6×4 DataFrame   
 Row | day     distance     time  raining
     | String     Int64  Float64     Bool
_____|__________________________________
   1 | Wednesday   2000    14.99     true
   2 | Monday      5000    31.68    false
   3 | Thursday    3500    22.02     true
   4 | Tuesday     3000    17.25     true
   5 | Thursday    4500    25.47    false
   6 | Monday      5000    30.77     true
# df[rownum, colnum]
t = df_run[6, 3]
println(t)
30.77
# df[rowrange, colnum]
ts = df_run[5:6, 3]

println(ts)
[25.47, 30.77]
Introduction to Julia

Selecting an element from the DataFrame

df_run = DataFrame(CSV.File("run.csv"))
println(df_run)
6×4 DataFrame   
 Row | day     distance     time  raining
     | String     Int64  Float64     Bool
_____|__________________________________
   1 | Wednesday   2000    14.99     true
   2 | Monday      5000    31.68    false
   3 | Thursday    3500    22.02     true
   4 | Tuesday     3000    17.25     true
   5 | Thursday    4500    25.47    false
   6 | Monday      5000    30.77     true
# df[rownum, colnum]
t = df_run[6, 3]
println(t)
30.77
# df[rowrange, colnum]
ts = df_run[end-1:end, 3]
println(ts)
[25.47, 30.77]
Introduction to Julia

Selecting a column

df_run = DataFrame(CSV.File("run.csv"))
println(df_run)
6×4 DataFrame   
 Row | day     distance     time  raining
     | String     Int64  Float64     Bool
_____|__________________________________
   1 | Wednesday   2000    14.99     true
   2 | Monday      5000    31.68    false
   3 | Thursday    3500    22.02     true
   4 | Tuesday     3000    17.25     true
   5 | Thursday    4500    25.47    false
   6 | Monday      5000    30.77     true
# df[:, colnum]
distances = df_run[:, 2]
println(distances)
[2000, 5000, 3500, 3000, 4500, 5000]
Introduction to Julia

Selecting a column

df_run = DataFrame(CSV.File("run.csv"))
println(df_run)
6×4 DataFrame   
 Row | day     distance     time  raining
     | String     Int64  Float64     Bool
_____|__________________________________
   1 | Wednesday   2000    14.99     true
   2 | Monday      5000    31.68    false
   3 | Thursday    3500    22.02     true
   4 | Tuesday     3000    17.25     true
   5 | Thursday    4500    25.47    false
   6 | Monday      5000    30.77     true
# df[:, colnum]
distances = df_run[:, 2]
# df[:, "colname"]
distances = df_run[:, "distance"]
# df.colname
distances = df_run.distance
println(distances)
[2000, 5000, 3500, 3000, 4500, 5000]
Introduction to Julia

Selecting an element from the DataFrame

df_run = DataFrame(CSV.File("run.csv"))
println(df_run)
6×4 DataFrame   
 Row | day     distance     time  raining
     | String     Int64  Float64     Bool
_____|__________________________________
   1 | Wednesday   2000    14.99     true
   2 | Monday      5000    31.68    false
   3 | Thursday    3500    22.02     true
   4 | Tuesday     3000    17.25     true
   5 | Thursday    4500    25.47    false
   6 | Monday      5000    30.77     true
# df[rownum, colnum]
d = df_run[6, 2]
# df[rownum, "colname"]
d = df_run[6, "distance"]
# df.colname[rownum]
d = df_run.distance[6]
println(d)
5000
Introduction to Julia

Slicing multiple columns

df_run = DataFrame(CSV.File("run.csv"))
println(df_run)
6×4 DataFrame   
 Row | day     distance     time  raining
     | String     Int64  Float64     Bool
_____|__________________________________
   1 | Wednesday   2000    14.99     true
   2 | Monday      5000    31.68    false
   3 | Thursday    3500    22.02     true
   4 | Tuesday     3000    17.25     true
   5 | Thursday    4500    25.47    false
   6 | Monday      5000    30.77     true
df_3cols = df_run[:, 1:3]

println(df_3cols)
6×3 DataFrame   
 Row | day     distance     time
     | String     Int64  Float64
_____|__________________________
   1 | Wednesday   2000    14.99
   2 | Monday      5000    31.68
   3 | Thursday    3500    22.02
   4 | Tuesday     3000    17.25
   5 | Thursday    4500    25.47
   6 | Monday      5000    30.77
Introduction to Julia

Selecting rows

df_run = DataFrame(CSV.File("run.csv"))
println(df_run)
6×4 DataFrame   
 Row | day     distance     time  raining
     | String     Int64  Float64     Bool
_____|__________________________________
   1 | Wednesday   2000    14.99     true
   2 | Monday      5000    31.68    false
   3 | Thursday    3500    22.02     true
   4 | Tuesday     3000    17.25     true
   5 | Thursday    4500    25.47    false
   6 | Monday      5000    30.77     true
# df[rownum, :]
println(df_run[4, :])
DataFrameRow
 Row | day    distance     time  raining
_____|__________________________________
   4 | Tuesday    3000    17.25     true
Introduction to Julia

Selecting multiple rows

df_run = DataFrame(CSV.File("run.csv"))
println(df_run)
6×4 DataFrame   
 Row | day     distance     time  raining
     | String     Int64  Float64     Bool
_____|__________________________________
   1 | Wednesday   2000    14.99     true
   2 | Monday      5000    31.68    false
   3 | Thursday    3500    22.02     true
   4 | Tuesday     3000    17.25     true
   5 | Thursday    4500    25.47    false
   6 | Monday      5000    30.77     true
# df[rowrange, :]
println(df_run[2:4, :])
3×4 DataFrame   
 Row | day     distance     time  raining
     | String     Int64  Float64     Bool
_____|__________________________________
   1 | Monday      5000    31.68    false
   2 | Thursday    3500    22.02     true
   3 | Tuesday     3000    17.25     true
Introduction to Julia

Sorting DataFrames

df_sort = sort(df_run, "time")

println(df_sort)
6×4 DataFrame   
 Row | day     distance     time  raining
     | String     Int64  Float64     Bool
_____|__________________________________
   1 | Wednesday   2000    14.99     true
   2 | Tuesday     3000    17.25     true
   3 | Thursday    3500    22.02     true
   4 | Thursday    4500    25.47    false
   5 | Monday      5000    30.77     true
   6 | Monday      5000    31.68    false
df_sort = sort(df_run, "time", rev=true)

println(df_sort)
6×4 DataFrame   
 Row | day     distance     time  raining
     | String     Int64  Float64     Bool
_____|___________________________________
   1 | Monday      5000    31.68    false
   2 | Monday      5000    30.77     true
   3 | Thursday    4500    25.47    false
   4 | Thursday    3500    22.02     true
   5 | Tuesday     3000    17.25     true
   6 | Wednesday   2000    14.99     true
Introduction to Julia

Cheat sheet

Select a column
  • df[:, "colname"]
  • df[:, colnum]
  • df.colname
Select a row

df[rownum, :]

Select multiple columns

df[:, colnum1:colnum2]

Select multiple rows

df[rownum1:rownum2, :]

Select a single value
  • df[rownum, "colname"]
  • df[rownum, colnum]
  • df.colname[rownum]
Sorting by column

Ascending order

sort(df, "colname")

Descending order sort(df, "colname", rev=true)

Introduction to Julia

Let's practice!

Introduction to Julia

Preparing Video For Download...