DataFrames

Introduction to Julia

James Fulton

Climate informatics researcher

Tabular data

Day Distance Time Raining
1 Wednesday 2000 14.99 true
2 Monday 5000 31.68 false
3 Thursday 3500 22.02 true
4 Tuesday 3000 17.25 true
5 Thursday 4500 25.47 false
6 Monday 5000 30.77 true
Introduction to Julia

Tabular data

Day Distance Time Raining
1 Wednesday 2000 14.99 true
2 Monday 5000 31.68 false
3 Thursday 3500 22.02 true
4 Tuesday 3000 17.25 true
5 Thursday 4500 25.47 false
6 Monday 5000 30.77 true
String Int Float Bool
Introduction to Julia

DataFrames

using DataFrames
# Create DataFrame
df = DataFrames.DataFrame(




)
Introduction to Julia

DataFrames

using DataFrames
# Create DataFrame
df = DataFrame(

day = ["Wednesday", "Monday", "Thursday", "Tuesday", "Thursday", "Monday"]
distance = [2000, 5000, 3500, 3000, 4500, 5000] time = [14.99, 31.68, 22.02, 17.25, 25.47, 30.77] raining = [true, false, true, true, false, true]
)
Introduction to Julia

DataFrames

using DataFrames
# Create DataFrame
df = DataFrame(

day = ["Wednesday", "Monday", "Thursday", "Tuesday", "Thursday", "Monday"], distance = [2000, 5000, 3500, 3000, 4500, 5000], time = [14.99, 31.68, 22.02, 17.25, 25.47, 30.77], raining = [true, false, true, true, false, true], )
Introduction to Julia

DataFrames

println(df)
6×4 DataFrame   
 Row | day        distance  time     raining
     | String     Int64     Float64  Bool 
_____|______________________________________
   1 | Wednesday      2000    14.99     true
   2 | Monday         5000    31.68    false
   3 | Thursday       3500    22.02     true
   4 | Tuesday        3000    17.25     true
   5 | Thursday       4500    25.47    false
   6 | Monday         5000    30.77     true
Introduction to Julia

CSV files

  • Comma separated variable
  • Common format for tabular data

Inside run.csv:

day,distance,time,raining
Wednesday,2000,14.99,true
Monday,5000,31.68,false
Thursday,3500,22.02,true
Tuesday,3000,17.25,true
Thursday,4500,25.47,false
Monday,5000,30.77,true
Introduction to Julia

Loading CSV files

using CSV
# Load the run data
file = CSV.File("run.csv")


# Convert the CSV file into the DataFrame df = DataFrame(file)
  • Cannot use File("run.csv") only CSV.File("run.csv")
Introduction to Julia

Printing DataFrames

# Print the first 3 rows
println(first(df, 3))
3×4 DataFrame   
 Row | day        distance  time     raining
     | String     Int64     Float64  Bool 
_____|______________________________________
   1 | Wednesday      2000    14.99     true
   2 | Monday         5000    31.68    false
   3 | Thursday       3500    22.02     true
Introduction to Julia

Basic properties of DataFrames

# Print column names
println(names(df))
["day", "distance", "time", "raining"]
# Print number of rows and columns
println(size(df))
(6, 4)
Introduction to Julia

Let's practice!

Introduction to Julia

Preparing Video For Download...