Joining Data with data.table in R
Scott Ritchie
Postdoctoral Researcher in Systems Genomics
General form of data.table syntax
DT[i, j, by]
| | |
| | --> grouped by what?
| -----> what to do?
--------> on which rows?
General form of data.table syntax joins
DT[i, on]
| |
| ----> join key columns
--------> join to which data.table?
The default join is a right join
demographics[shipping, on = .(name)]

Variables inside list() or .() are looked up in the column names of both data.tables
shipping[demographics, on = list(name)]
shipping[demographics, on = .(name)]
Character vectors can also be used
join_key <- c("name")
shipping[demographics, on = join_key]
Remember, a left join is the same as a right join with the order swapped:
shipping[demographics, on = .(name)]

Set nomatch = 0 to perform an inner join:
shipping[demographics, on = .(name), nomatch = 0]

Not possible with the data.table syntax, use the merge() function:
merge(demographics, shipping, by = "name", all = TRUE)

Filter a data.table to rows that have no match in another data.table
demographics[!shipping, on = .(name)]

Joining Data with data.table in R