Computations by groups

Data Manipulation with data.table in R

Matt Dowle, Arun Srinivasan

Instructors, DataCamp

The by argument

The by argument allows computations for each unique value of the (grouping) columns specified in by

# How many trips happened from each start_station?
ans <- batrips[, .N, by = "start_station"]
head(ans, 3)
          start_station        N
San Francisco City Hall     2145
 Embarcadero at Sansome    12879
      Steuart at Market    11579
Data Manipulation with data.table in R

The by argument

by argument accepts both character vector of column names as well as a list of variables/expressions

# Same as batrips[, .N, by = "start_station"]
ans <- batrips[, .N, by = .(start_station)]
head(ans, 3)
          start_station        N
San Francisco City Hall     2145
 Embarcadero at Sansome    12879
      Steuart at Market    11579
Data Manipulation with data.table in R

The by argument

Allows renaming grouping columns on the fly

ans <- batrips[, .(no_trips = .N), by = .(start = start_station)]
head(ans, 3)
                  start   no_trips
San Francisco City Hall       2145
 Embarcadero at Sansome      12879
      Steuart at Market      11579
Data Manipulation with data.table in R

Expressions in by

The list() or .() expression in by allows for grouping variables to be computed on the fly

# Get number of trips for each start_station for each month
ans <- batrips[ , .N, by = .(start_station, mon = month(start_date))]
head(ans, 3)
          start_station mon    N
San Francisco City Hall   1  193
 Embarcadero at Sansome   1  985
      Steuart at Market   1  813
Data Manipulation with data.table in R

Let's practice!

Data Manipulation with data.table in R

Preparing Video For Download...