Complex keys

Joining Data with data.table in R

Scott Ritchie

Postdoctoral Researcher in Systems Genomics

Misspecified joins

What happens when you don't use the correct columns for join keys?

  • An error is thrown
  • The result is a malformed data.table
Joining Data with data.table in R

Column type mismatch

Using join key columns with different types will error

customers[web_visits, on = .(age = name)]
Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, 
nomatch,  : 
  typeof x.age (double) != typeof i.name (character)

Joining Data with data.table in R

Column type mismatch

customers[web_visits, on = .(id)]
Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, 
nomatch,  : 
  typeof x.id (integer) != typeof i.id(character)

Joining Data with data.table in R

Malformed full joins - no common key values

merge(customers, web_visits, by.x = "address", by.y = "name", all = TRUE)

Joining Data with data.table in R

Malformed right and left joins - no common key values

customers[web_visits, on = .(address = name)]

Joining Data with data.table in R

Malformed inner joins - no common key values

customers[web_visits, on = .(address = name), nomatch = 0]

Joining Data with data.table in R

Malformed joins - coincidental common key values

customers[web_visits, on = .(age = duration), nomatch = O]

Joining Data with data.table in R

Avoiding misspecified joins

Learning what each column represents before joins will help you avoid errors

Joining Data with data.table in R

Keys with different column names

merge(customers, web_visits, by.x = "name", by.y = "person")

customers[web_visits, on = .(name = person)] customers[web_visits, on = c("name" = "person")] key <- c("name" = "person") customers[web_visits, on = key]
Joining Data with data.table in R

Multi-column keys

Joining Data with data.table in R

Multi-column keys

Joining Data with data.table in R

Specifying multiple keys with merge()

merge(purchases, web_visits, by = c("name", "date"))
merge(purchases, web_visits, 
      by.x = c("name", "date"),  
      by.y = c("person", "date")
Joining Data with data.table in R

Specifying multiple keys with the data.table syntax

purchases[web_visits, on = .(name, date)]
purchases[web_visits, on = c("name", "date")]
purchases[web_visits, on = .(name = person, date)]
purchases[web_visits, on = c("name" = "person", "date")]
Joining Data with data.table in R

Final Slide

Joining Data with data.table in R

Preparing Video For Download...