Data Manipulation with data.table in R
Matt Dowle, Arun Srinivasan
Instructors, DataCamp
The by
argument allows computations for each unique value of the (grouping) columns specified in by
# How many trips happened from each start_station?
ans <- batrips[, .N, by = "start_station"]
head(ans, 3)
start_station N
San Francisco City Hall 2145
Embarcadero at Sansome 12879
Steuart at Market 11579
by
argument accepts both character
vector of column names as well as a list
of variables/expressions
# Same as batrips[, .N, by = "start_station"]
ans <- batrips[, .N, by = .(start_station)]
head(ans, 3)
start_station N
San Francisco City Hall 2145
Embarcadero at Sansome 12879
Steuart at Market 11579
Allows renaming grouping columns on the fly
ans <- batrips[, .(no_trips = .N), by = .(start = start_station)]
head(ans, 3)
start no_trips
San Francisco City Hall 2145
Embarcadero at Sansome 12879
Steuart at Market 11579
The list()
or .()
expression in by
allows for grouping variables to be computed on the fly
# Get number of trips for each start_station for each month
ans <- batrips[ , .N, by = .(start_station, mon = month(start_date))]
head(ans, 3)
start_station mon N
San Francisco City Hall 1 193
Embarcadero at Sansome 1 985
Steuart at Market 1 813
Data Manipulation with data.table in R