Chaining data.table expressions

Data Manipulation with data.table in R

Matt Dowle, Arun Srinivasan

Instructors, DataCamp

Chaining expressions

data.table expressions can be chained together, i.e., x[...][...][...]

step_1 <- batrips[duration > 3600]
step_2 <- step_1[duration > 3600][order(duration)]
step_2[1:3]
# Same as
batrips[duration > 3600]

batrips[duration > 3600][order(duration)]
batrips[duration > 3600][order(duration)][1:3]
trip_id duration          
 295912 3601     
 347471 3602     
 536050 3602
Data Manipulation with data.table in R

Chaining expressions

# Three start stations with the lowest mean duration
step_1 <- batrips[, .(mn_dur = mean(duration)), by = "start_station"]
step_2 <- step_1[order(mn_dur)]
step_2[1:3]
# Three start stations with the lowest mean duration
batrips[, .(mn_dur = mean(duration)),  
        by = "start_station"][order(mn_dur)][1:3]
                                 start_station   mn_dur
                                 2nd at Folsom 551.0807
 Temporary Transbay Terminal (Howard at Beale) 655.8563
                             2nd at South Park 697.7034
Data Manipulation with data.table in R

uniqueN()

  • uniqueN() is a helper function that returns an integer value containing the number of unique values in the input object
  • It accepts vectors as well as data.frames and data.tables.
id <- c(1, 2, 2, 1)
uniqueN(id)
2
x <- data.table(id, val = 1:4)
id val
 1   1
 2   2
 2   3
 1   4
uniqueN(x)
4
uniqueN(x, by = "id")
2
Data Manipulation with data.table in R

uniqueN() together with by

Calculate the total number of unique bike ids for every month

ans <- batrips[, uniqueN(bike_id), by = month(start_date)]
head(ans, 3)
 month    V1   ## <~~ auto naming of cols
     1   605
     2   608
     3   631
Data Manipulation with data.table in R

Let's practice!

Data Manipulation with data.table in R

Preparing Video For Download...