Computations in j using .SD

Data Manipulation with data.table in R

Matt Dowle, Arun Srinivasan

Instructors, DataCamp

Subset of Data, .SD

  • .SD is a special symbol which stands for Subset of Data
  • Contains subset of data corresponding to each group; which itself is a data.table
  • By default, the grouping columns are excluded for convenience
x <- data.table(id = c(1, 1 ,2, 2, 1, 1), 
                val1 = 1:6, val2 = letters[6:1])
id val1 val2
 1    1    f
 1    2    e
 2    3    d
 2    4    c
 1    5    b
 1    6    a
Data Manipulation with data.table in R

Subset of Data, .SD

x[, print(.SD), by = id]
val1 val2
   1    f
   2    e
   5    b
   6    a
val1 val2
   3    d
   4    c
Empty data.table (0 rows) of 1 col: id
Data Manipulation with data.table in R

Subset of Data, .SD

x[, .SD[1], by = id]
id val1 val2
 1    1    f
 2    3    d
Data Manipulation with data.table in R

Subset of Data, .SD

x[, .SD[.N], by = id]
id val1 val2
 1    6    a
 2    4    c
Data Manipulation with data.table in R

.SDcols

.SDcols holds the columns that should be included in .SD

batrips[, .SD[1], by = start_station]
           start_station   trip_id   duration            start_date  
 San Francisco City Hall    139545        435   2014-01-01 00:14:00  
  Embarcadero at Sansome    139547       1523   2014-01-01 00:17:00
# .SDcols controls the columns .SD contains
batrips[, .SD[1], by = start_station, .SDcols = c("trip_id", "duration")]
          start_station   trip_id   duration
San Francisco City Hall    139545        435
 Embarcadero at Sansome    139547       1523
Data Manipulation with data.table in R

.SDcols

batrips[, .SD[1], by = start_station, .SDcols = - c("trip_id", "duration")]
           start_station             start_date           
 San Francisco City Hall    2014-01-01 00:14:00  
  Embarcadero at Sansome    2014-01-01 00:17:00  
Data Manipulation with data.table in R

Let's practice!

Data Manipulation with data.table in R

Preparing Video For Download...