Advanced computations in j

Data Manipulation with data.table in R

Matt Dowle, Arun Srinivasan

Instructors, DataCamp

Compute in j and return a data.table

Recall that you can select multiple columns using .()

# Recap: Select trip_id and duration columns
ans <- batrips[, .(trip_id, dur = duration)]
head(ans, 2)
trip_id      dur
139545      435
139546      432

You can compute on multiple columns and return a data.table the same way

# Get mean and median of duration
batrips[, .(mn_dur = mean(duration), 
            med_dur = median(duration))]
mn_dur    med_dur
1131.967  511
Data Manipulation with data.table in R

Question

  • How would you perform this operation using the data frame way?
  • Is your code straightforward and clear?
# Get mean and median of duration
batrips[, .(mn_dur = mean(duration), med_dur = median(duration))]
mn_dur     med_dur
1131.967   511
Data Manipulation with data.table in R

Combining with i

Together with i, you can compute on columns in j only for those rows that satisfy a condition

batrips[start_station == "Japantown", .(mn_dur = mean(duration), 
                                        med_dur = median(duration))]
mn_dur    med_dur
2464.331  782
Data Manipulation with data.table in R

Question

  • How would you perform this operation using the data frame way?
  • Is your code straightforward and clear?
batrips[start_station == "Japantown", .(mn_dur = mean(duration), 
                                        med_dur = median(duration))]
mn_dur     med_dur
2464.331   782
Data Manipulation with data.table in R

Let's practice!

Data Manipulation with data.table in R

Preparing Video For Download...