Introduction to Julia
James Fulton
Climate informatics researcher
# Summarize the runs DataFrame
println(describe(df_run))
4×7 DataFrame
Row | variable mean min median max nmissing eltype
| Symbol Union… Any Union… Any Int64 DataType
_____|__________________________________________________________________
1 | day Monday Wednesday 0 String
2 | distance 3833.33 2000 4000.0 5000 0 Int64
3 | time 23.6967 14.99 23.745 31.68 0 Float64
4 | raining 0.666667 false 1.0 true 0 Bool
using statistics
Functions in Statistics
:
mean()
- Calculate mean of arraymedian()
- Calculate median value of arraystd()
- Calculate standard deviation of array valuesvar()
- Calculate variance of array values# Calculate average of distance column
average_distance = mean(df_run[:, "distance"])
sum()
- Calculate sum of arrayminimum()
- Calculate minimum value in arraymaximum()
- Calculate maximum value in arraytotal_distance = sum(df_run[:, "distance"]) # Returns 23000
minimum_distance = minimum(df_run[:, "distance"]) # Returns 2000
maximum_distance = maximum(df_run[:, "distance"]) # Returns 5000
For columns a
and b
of DataFrame df
Operation | Scalar example | Array example |
---|---|---|
Addition | df.a .+ 1 |
df.a .+ df.b or df.a + df.b |
Subtraction | df.a .- 1 |
df.a .- df.b or df.a - df.b |
Multiplication | 2 .* df.a or 2 * df.a |
df.a .* df.b |
Division | df.a ./ 2 or df.a / 2 |
df.a ./ df.b |
# Convert distances to kilometers
distance_km = df_run.distance ./ 1000
# Convert run times to hours
time_hr = df_run.time ./ 60
println(distance_km)
println(time_hr)
[2.0, 5.0, 3.5, 3.0, 4.5, 5.0]
[0.25, 0.53, 0.37, 0.29, 0.42, 0.51]
6×4 DataFrame
Row | distance time ...
| Int64 Float64 ...
_____|__________________ ...
1 | 2000 14.99 ...
2 | 5000 31.68 ...
3 | 3500 22.02 ...
4 | 3000 17.25 ...
5 | 4500 25.47 ...
6 | 5000 30.77 ...
# Convert distances to kilometers
distance_km = df_run.distance ./ 1000
# Convert run times to hours
time_hr = df_run.time ./ 60
# Run speed in km/hr speeds = distance_km ./ time_hr
println(speeds)
[8.01, 9.47, 9.54, 10.43, 10.60, 9.75]
6×4 DataFrame
Row | distance time ...
| Int64 Float64 ...
_____|__________________ ...
1 | 2000 14.99 ...
2 | 5000 31.68 ...
3 | 3500 22.02 ...
4 | 3000 17.25 ...
5 | 4500 25.47 ...
6 | 5000 30.77 ...
# Assign run speeds to new column named "speed"
df_run[:, "speed"] = distance_km ./ time_hr
# Assign using dot form
df_run.speed = distance_km ./ time_hr
println(df_run)
6×4 DataFrame
Row | day distance time raining speed
| String Int64 Float64 Bool Float64
_____|____________________________________________
1 | Wednesday 2000 14.99 true 8.01
2 | Monday 5000 31.68 false 9.47
3 | Thursday 3500 22.02 true 9.54
4 | Tuesday 3000 17.25 true 10.43
5 | Thursday 4500 25.47 false 10.60
6 | Monday 5000 30.77 true 9.75
Introduction to Julia