Data Manipulation with data.table in R
Matt Dowle, Arun Srinivasan
Instructors, DataCamp
data.table expressions can be chained together, i.e., x[...][...][...]
step_1 <- batrips[duration > 3600]
step_2 <- step_1[duration > 3600][order(duration)]
step_2[1:3]
# Same as batrips[duration > 3600]
batrips[duration > 3600][order(duration)]
batrips[duration > 3600][order(duration)][1:3]
trip_id duration
295912 3601
347471 3602
536050 3602
# Three start stations with the lowest mean duration
step_1 <- batrips[, .(mn_dur = mean(duration)), by = "start_station"]
step_2 <- step_1[order(mn_dur)]
step_2[1:3]
# Three start stations with the lowest mean duration
batrips[, .(mn_dur = mean(duration)),
by = "start_station"][order(mn_dur)][1:3]
start_station mn_dur
2nd at Folsom 551.0807
Temporary Transbay Terminal (Howard at Beale) 655.8563
2nd at South Park 697.7034
uniqueN()
is a helper function that returns an integer value containing the number of unique values in the input objectdata.frames
and data.tables
.id <- c(1, 2, 2, 1)
uniqueN(id)
2
x <- data.table(id, val = 1:4)
id val
1 1
2 2
2 3
1 4
uniqueN(x)
4
uniqueN(x, by = "id")
2
Calculate the total number of unique bike ids for every month
ans <- batrips[, uniqueN(bike_id), by = month(start_date)]
head(ans, 3)
month V1 ## <~~ auto naming of cols
1 605
2 608
3 631
Data Manipulation with data.table in R