Loading and writing CSV files

Data Manipulation in Julia

Katerina Zahradova

Instructor

Delimiters

  • Delimiter = a character or a string used to separate values
  • Examples of a delimiter include, e.g., ",", " ", "\t", ...
# Loading file with a space as a delimiter
penguins = DataFrame(CSV.File("penguins.csv",delim=" "))
Data Manipulation in Julia

Decimal mark

  • Decimal point (blue), e.g., 3.14
  • Decimal comma (green), e.g., 3,14
  • Both (dark green)
  • Arabic decimal separator (red)

Decimal marks around the world

# Loading file with comma as decimal mark
penguins = DataFrame(CSV.File(
        "penguins.csv",
        decimal=',', delim=" "))
1 By NuclearVacuum, Wikipedia
Data Manipulation in Julia

Loading parts of datasets

# Loading lines 13 till 27
penguins_part = DataFrame(CSV.File("penguins.csv", skipto=10, limit=3))
3×7 DataFrame
Row species  island     culmen_length_mm  culmen_depth_mm  ...
    String7  String15   Float64           Float64          ...
______________________________________________________________
1   Adelie   Torgersen  38.6              21.2             ...
2   Adelie   Torgersen  34.6              21.1             ...
3   Adelie   Torgersen  36.6              17.8             ...
Data Manipulation in Julia

Header

# Specifying header as a line
penguins_header = DataFrame(CSV.File("penguins.csv", header = 1))
333×7 DataFrame
Row species  island     culmen_length_mm  culmen_depth_mm  ...
    String7  String15   Float64           Float64          ...
______________________________________________________________
1   Adelie   Torgersen  39.1              18.7             ...
2   Adelie   Torgersen  39.5              17.4             ...
3   Adelie   Torgersen  40.3              18.0             ...
...
Data Manipulation in Julia

Header over multiple lines

# Multiline header
penguins_header = DataFrame(CSV.File("penguins.csv", header = [1, 2]))
332×7 DataFrame
Row species_Adelie  island_Torgersen  culmen_length_mm_39.1  ...
    String7         String15          Float64                ...
________________________________________________________________
1   Adelie          Torgersen         39.5                   ...
2   Adelie          Torgersen         40.3                   ...
3   Adelie          Torgersen         36.7                   ...
...
Data Manipulation in Julia

Replacing the header

# Replacing header
penguins_header = DataFrame(CSV.File("penguins.csv",
                        header = [:species, :area, :culmen_l_mm, :culmen_d_mm,
                            :flipper_l_mm, :weight_g, :sex]))
333×7 DataFrame
Row species  area       culmen_l_mm  culmen_d_mm      ...
    String7  String15   Float64      Float64          ...
____________________________________________________
1   Adelie   Torgersen  39.1         18.7             ...
2   Adelie   Torgersen  39.5         17.4             ...
3   Adelie   Torgersen  40.3         18.0             ...
...
Data Manipulation in Julia

Writing CSV files

# Save DataFrame
CSV.write("temp/transformed_penguins.csv", delim = " ", decimal = ',')
Data Manipulation in Julia

Cheat sheet

  • delim=: a Char or String separating values in columns; e.g., species,island,...

  • decimal=: a Char indicating how decimal places are separated in floats; e.g., . in 3.14

  • skipto=: an Int specifying the row number in the file where you want to start loading; beware - header is included!

  • limit=: an Int specifying the number of rows you want to load

  • header=: an Int for row number of a header, a Vector{Int} for multiple lines, a Vector{String} or Vector{Symbol} to rewrite header

  • CSV.File(path) loads a file in path

  • CSV.write(path, df) writes df as a CSV in path

Documentation for CSV.File() and CSV.write

Data Manipulation in Julia

Let's practice!

Data Manipulation in Julia

Preparing Video For Download...