Joining Data with data.table in R
Scott Ritchie
Postdoctoral Researcher in Systems Genomics
Setting keys means you don't need the on
argument when performing a join
data.table
in many different joins Sorts the data.table
in memory by the key column(s)
Multiple columns can be set and used as keys
Key columns are passed as arguments
setkey(DT, ...)
setkey(DT, key1, key2, key3)
setkey(DT, "key1", "key2", "key3")
# To set all columns in DT as keys
setkey(DT)
Set the keys of both data.tables
before a join
setkey(dt1, dt1_key)
setkey(dt2, dt2_key)
Perform an inner, right, and left join:
# Inner join dt1 and dt2 dt1[dt2, nomatch = 0]
# Right join dt1 and dt2 dt1[dt2]
# Left join dt1 and dt2 dt2[dt1]
Key columns are provided as a character vector
keys <- c("key1", "key2", "key3")
setkeyv(dt, keys)
haskey()
checks whether you have set keys
haskey(dt1)
TRUE
key()
returns the key columns you have set
key(dt1)
"dt1_key"
When no keys are set
haskey(dt_no_key)
FALSE
key(dt_no_key)
NULL
tables()
NAME NROW NCOL MB COLS KEY
[1,] dt 3 4 1 key1,key2,key3,value key1,key2,key3
[2,] dt1 1,000 3 1 dt1_key_column,value,group dt1_key
[3,] dt2 1,000 2 1 dt2_key_column,time dt2_key
[4,] dt_no_key 5 2 1 id,color
Total: 4MB
Joining Data with data.table in R