data.table syntax

Joining Data with data.table in R

Scott Ritchie

Postdoctoral Researcher in Systems Genomics

Recap of the data.table syntax

General form of data.table syntax

DT[i, j, by]
   |  |  |
   |  |  --> grouped by what?
   |  -----> what to do?
   --------> on which rows?
Joining Data with data.table in R

Joins

General form of data.table syntax joins

DT[i, on]
   |   |
   |   ----> join key columns
   --------> join to which data.table?
Joining Data with data.table in R

Right joins

The default join is a right join

demographics[shipping, on = .(name)]

Joining Data with data.table in R

The on argument

Variables inside list() or .() are looked up in the column names of both data.tables

shipping[demographics, on = list(name)]
shipping[demographics, on = .(name)]

Character vectors can also be used

join_key <- c("name")
shipping[demographics, on = join_key]
Joining Data with data.table in R

Left joins

Remember, a left join is the same as a right join with the order swapped:

shipping[demographics, on = .(name)]

Joining Data with data.table in R

Inner joins

Set nomatch = 0 to perform an inner join:

shipping[demographics, on = .(name), nomatch = 0]

Joining Data with data.table in R

Full joins

Not possible with the data.table syntax, use the merge() function:

merge(demographics, shipping, by = "name", all = TRUE)

Joining Data with data.table in R

Anti-joins

Filter a data.table to rows that have no match in another data.table

demographics[!shipping, on = .(name)]

Joining Data with data.table in R

Let's practice!

Joining Data with data.table in R

Preparing Video For Download...