Adding and updating columns by reference

R'de data.table ile Veri İşleme

Matt Dowle, Arun Srinivasan

Instructors, DataCamp

data.frame internals

Let's say we would like to change the 2nd row of column "y" to 10

df <- data.frame(x = 1:5, y = 6:10)
df
x  y
1  6
2  7
df$y[2] <- 10
R'de data.table ile Veri İşleme

data.frame internals

In R < v3.1.0, this operation resulted in deep copying the entire data.frame

# what happens internally prior to R v3.1.0
tmp <- <deep copy of "df">
tmp$y[2] <- 10
df <- tmp
  • What happens if you would like to do the same operation on a 10GB data.frame?
R'de data.table ile Veri İşleme

data.frame internals

  • In v3.1.0, improvements were made to deep copy only the column that is updated

  • In this case, just columns a and b are deep copied in the operation performed on df below

df <- data.frame(a = 1:3, b = 4:6, c = 7:9, d = 10:12)
df[1:2] <- lapply(df[1:2], function(x) ifelse(x%%2, x, NA))
df
 a  b c  d
 1 NA 7 10
NA  5 8 11
 3 NA 9 12
R'de data.table ile Veri İşleme

data.table internals

  • data.table updates columns in place, i.e., by reference

  • This means, you don't need the assign the result back to a variable

  • No copy of any column is made while their values are changed

  • data.table uses a new operator := to add/update/delete columns by reference

R'de data.table ile Veri İşleme

LHS := RHS form

batrips[, c("is_dur_gt_1hour", "week_day") := list(duration > 3600, 
                                                   wday(start_date))]

# When adding a single column quotes aren't necessary batrips[, is_dur_gt_1hour := duration > 3600]
R'de data.table ile Veri İşleme

Functional form

batrips[, `:=`(is_dur_gt_1hour = NULL,                  
               start_station = toupper(start_station))] 
R'de data.table ile Veri İşleme

Let's practice!

R'de data.table ile Veri İşleme

Preparing Video For Download...