Data Manipulation in Julia
Katerina Zahradova
Instructor
# Selecting all columns from chocolates except the review_date column
select(chocolates, :company, :bean_origin, :REF, :cocoa,
:company_location, :ratings, :bean_type, :bean_location)
ArgumentError: column name "ratings" not found in the data frame;
existing most similar names are: "rating"
...
# Selecting all columns from chocolates except the review_date column
select(chocolates, Not(:review_date))
1795×9 DataFrame
Row company bean_origin REF cocoa company_location rating bean_type bean_location
String String Int64 Float64 String31 Float64 String31? String31?
______________________________________________________________________________________
1 A. Morin Agua Grande 1876 63.0 France 3.75 Sao Tome
...
# Dropping the same column twice
select!(chocolates, Not(:review_date))
# Several lines of code later ...
select(chocolates, Not(:review_date))
ArgumentError: column name :review_date not found in the data frame
# Using Cols
select(chocolates, Not(Cols(==("review_date"))))
1795×9 DataFrame
Row company bean_origin REF cocoa company_location rating bean_type bean_location
String String Int64 Float64 String31 Float64 String31? String31?
...
# Using regex
select(chocolates, Not(r"review\_date"))
1795×9 DataFrame
Row company bean_origin REF cocoa company_location rating bean_type bean_location
String String Int64 Float64 String31 Float64 String31? String31?
...
# Pre-selected chocolates
first(chocolates)
1795x5 DataFrame
Row company review_date cocoa rating bean_location
String Int64 Float64 Float64 String31?
__________________________________________________________
1 A. Morin 2016 63.0 3.75 Sao Tome
# Moving cocoa to the left
select(chocolates, :cocoa, :)
1795x5 DataFrame
Row cocoa company review_date rating bean_location
Float64 String Int64 Float64 String31?
__________________________________________________________
1 63.0 A. Morin 2016 3.75 Sao Tome
...
# Moving cocoa and rating to the left
select(chocolates, :cocoa, :rating, :)
1795x5 DataFrame
Row cocoa rating company review_date bean_location
Float64 Float64 String Int64 String31?
__________________________________________________________
1 63.0 3.75 A. Morin 2016 Sao Tome
...
# Moving company to the right
select(chocolates, Not(:company), :company)
1795x5 DataFrame
Row review_date cocoa rating bean_location company
Int64 Float64 Float64 String31? String
__________________________________________________________
1 2016 63.0 3.75 Sao Tome A. Morin
...
# Combine the two moves
select(chocolates, :cocoa, :rating, Not(:company), :company)
1795x5 DataFrame
Row cocoa rating bean_location review_date company
Float64 Float64 String31 Int64 String
__________________________________________________________
1 63.0 3.75 Sao Tome 2016 A. Morin
...
# Reorder and drop review_date
# Combine the two moves
select(chocolates, :cocoa, :rating, Not([:company, :review_date]), :company)
1795x4 DataFrame
Row cocoa rating bean_location company
Float64 Float64 String31 String
__________________________________________________________
1 63.0 3.75 Sao Tome A. Morin
...
Dropping columns:
# Drop col1 and col 2
select(df, Not([:col1, "col 2"]))
Dropping columns safely:
# Drop col1 and col 2 that might not exist
select(df, Not(r"col1"), Not(Cols(==("col 2"))))
Moving columns to the left
# Move col1 and col 2 to the left
select(df, :col1, "col 2", :)
Moving columns to the right
# Move col1 and col 2 to the right
select(df, Not([:col1, "col 2"]) ,:col1, "col 2")
Data Manipulation in Julia