Setting and viewing data.table keys

Joining Data with data.table in R

Scott Ritchie

Postdoctoral Researcher in Systems Genomics

Setting `data.table` keys

Setting keys means you don't need the on argument when performing a join

  • Useful if you need to use a data.table in many different joins

Sorts the data.table in memory by the key column(s)

  • Makes filtering and join operations faster

Multiple columns can be set and used as keys

Joining Data with data.table in R

The `setkey()` function

Key columns are passed as arguments

setkey(DT, ...)
setkey(DT, key1, key2, key3)

setkey(DT, "key1", "key2", "key3")
# To set all columns in DT as keys
setkey(DT)
Joining Data with data.table in R

The `setkey()` function

Set the keys of both data.tables before a join

setkey(dt1, dt1_key)
setkey(dt2, dt2_key)

Perform an inner, right, and left join:

# Inner join dt1 and dt2
dt1[dt2, nomatch = 0] 

# Right join dt1 and dt2 dt1[dt2]
# Left join dt1 and dt2 dt2[dt1]
Joining Data with data.table in R

Setting keys programmatically

Key columns are provided as a character vector

keys <- c("key1", "key2", "key3")
setkeyv(dt, keys)
Joining Data with data.table in R

Getting keys

haskey() checks whether you have set keys

haskey(dt1)
TRUE

key() returns the key columns you have set

key(dt1)
"dt1_key"
Joining Data with data.table in R

Getting keys

When no keys are set

haskey(dt_no_key)
FALSE
key(dt_no_key)
NULL
Joining Data with data.table in R

Viewing all `data.tables` and their keys

tables()
     NAME        NROW NCOL MB COLS                       KEY           
[1,] dt             3    4  1 key1,key2,key3,value       key1,key2,key3
[2,] dt1        1,000    3  1 dt1_key_column,value,group dt1_key
[3,] dt2        1,000    2  1 dt2_key_column,time        dt2_key
[4,] dt_no_key      5    2  1 id,color                                 
Total: 4MB
Joining Data with data.table in R

Let's practice!

Joining Data with data.table in R

Preparing Video For Download...