Joining Data with data.table in R
Scott Ritchie
Postdoctoral Researcher in Systems Genomics
General form of data.table
syntax
DT[i, j, by]
| | |
| | --> grouped by what?
| -----> what to do?
--------> on which rows?
General form of data.table
syntax joins
DT[i, on]
| |
| ----> join key columns
--------> join to which data.table?
The default join is a right join
demographics[shipping, on = .(name)]
Variables inside list()
or .()
are looked up in the column names of both data.tables
shipping[demographics, on = list(name)]
shipping[demographics, on = .(name)]
Character vectors can also be used
join_key <- c("name")
shipping[demographics, on = join_key]
Remember, a left join is the same as a right join with the order swapped:
shipping[demographics, on = .(name)]
Set nomatch = 0
to perform an inner join:
shipping[demographics, on = .(name), nomatch = 0]
Not possible with the data.table
syntax, use the merge()
function:
merge(demographics, shipping, by = "name", all = TRUE)
Filter a data.table
to rows that have no match in another data.table
demographics[!shipping, on = .(name)]
Joining Data with data.table in R