Data Manipulation with data.table in R
Matt Dowle, Arun Srinivasan
Instructors, DataCamp
Let's say we would like to change the 2nd row of column "y" to 10
df <- data.frame(x = 1:5, y = 6:10)
df
x y
1 6
2 7
df$y[2] <- 10
In R < v3.1.0, this operation resulted in deep copying the entire data.frame
# what happens internally prior to R v3.1.0
tmp <- <deep copy of "df">
tmp$y[2] <- 10
df <- tmp
In v3.1.0, improvements were made to deep copy only the column that is updated
In this case, just columns a
and b
are deep copied in the operation performed on df
below
df <- data.frame(a = 1:3, b = 4:6, c = 7:9, d = 10:12)
df[1:2] <- lapply(df[1:2], function(x) ifelse(x%%2, x, NA))
df
a b c d
1 NA 7 10
NA 5 8 11
3 NA 9 12
data.table
updates columns in place, i.e., by reference
This means, you don't need the assign the result back to a variable
No copy of any column is made while their values are changed
data.table
uses a new operator :=
to add/update/delete columns by reference
batrips[, c("is_dur_gt_1hour", "week_day") := list(duration > 3600, wday(start_date))]
# When adding a single column quotes aren't necessary batrips[, is_dur_gt_1hour := duration > 3600]
batrips[, `:=`(is_dur_gt_1hour = NULL,
start_station = toupper(start_station))]
Data Manipulation with data.table in R