Data Manipulation in Julia
Katerina Zahradova
Instructor
",", " ", "\t", ...# Loading file with a space as a delimiter
penguins = DataFrame(CSV.File("penguins.csv",delim=" "))
3.143,14
# Loading file with comma as decimal mark
penguins = DataFrame(CSV.File(
"penguins.csv",
decimal=',', delim=" "))
# Loading lines 13 till 27
penguins_part = DataFrame(CSV.File("penguins.csv", skipto=10, limit=3))
3×7 DataFrame
Row species island culmen_length_mm culmen_depth_mm ...
String7 String15 Float64 Float64 ...
______________________________________________________________
1 Adelie Torgersen 38.6 21.2 ...
2 Adelie Torgersen 34.6 21.1 ...
3 Adelie Torgersen 36.6 17.8 ...
# Specifying header as a line
penguins_header = DataFrame(CSV.File("penguins.csv", header = 1))
333×7 DataFrame
Row species island culmen_length_mm culmen_depth_mm ...
String7 String15 Float64 Float64 ...
______________________________________________________________
1 Adelie Torgersen 39.1 18.7 ...
2 Adelie Torgersen 39.5 17.4 ...
3 Adelie Torgersen 40.3 18.0 ...
...
# Multiline header
penguins_header = DataFrame(CSV.File("penguins.csv", header = [1, 2]))
332×7 DataFrame
Row species_Adelie island_Torgersen culmen_length_mm_39.1 ...
String7 String15 Float64 ...
________________________________________________________________
1 Adelie Torgersen 39.5 ...
2 Adelie Torgersen 40.3 ...
3 Adelie Torgersen 36.7 ...
...
# Replacing header
penguins_header = DataFrame(CSV.File("penguins.csv",
header = [:species, :area, :culmen_l_mm, :culmen_d_mm,
:flipper_l_mm, :weight_g, :sex]))
333×7 DataFrame
Row species area culmen_l_mm culmen_d_mm ...
String7 String15 Float64 Float64 ...
____________________________________________________
1 Adelie Torgersen 39.1 18.7 ...
2 Adelie Torgersen 39.5 17.4 ...
3 Adelie Torgersen 40.3 18.0 ...
...
# Save DataFrame
CSV.write("temp/transformed_penguins.csv", delim = " ", decimal = ',')
delim=: a Char or String separating values in columns; e.g., species,island,...
decimal=: a Char indicating how decimal places are separated in floats; e.g., . in 3.14
skipto=: an Int specifying the row number in the file where you want to start loading; beware - header is included!
limit=: an Int specifying the number of rows you want to load
header=: an Int for row number of a header, a Vector{Int} for multiple lines, a Vector{String} or Vector{Symbol} to rewrite header
CSV.File(path) loads a file in path
CSV.write(path, df) writes df as a CSV in path
Documentation for CSV.File() and CSV.write
Data Manipulation in Julia