Complex keys

R'de data.table ile Veri Birleştirme

Scott Ritchie

Postdoctoral Researcher in Systems Genomics

Misspecified joins

What happens when you don't use the correct columns for join keys?

  • An error is thrown
  • The result is a malformed data.table
R'de data.table ile Veri Birleştirme

Column type mismatch

Using join key columns with different types will error

customers[web_visits, on = .(age = name)]
Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, 
nomatch,  : 
  typeof x.age (double) != typeof i.name (character)

R'de data.table ile Veri Birleştirme

Column type mismatch

customers[web_visits, on = .(id)]
Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, 
nomatch,  : 
  typeof x.id (integer) != typeof i.id(character)

R'de data.table ile Veri Birleştirme

Malformed full joins - no common key values

merge(customers, web_visits, by.x = "address", by.y = "name", all = TRUE)

R'de data.table ile Veri Birleştirme

Malformed right and left joins - no common key values

customers[web_visits, on = .(address = name)]

R'de data.table ile Veri Birleştirme

Malformed inner joins - no common key values

customers[web_visits, on = .(address = name), nomatch = 0]

R'de data.table ile Veri Birleştirme

Malformed joins - coincidental common key values

customers[web_visits, on = .(age = duration), nomatch = O]

R'de data.table ile Veri Birleştirme

Avoiding misspecified joins

Learning what each column represents before joins will help you avoid errors

R'de data.table ile Veri Birleştirme

Keys with different column names

merge(customers, web_visits, by.x = "name", by.y = "person")

customers[web_visits, on = .(name = person)] customers[web_visits, on = c("name" = "person")] key <- c("name" = "person") customers[web_visits, on = key]
R'de data.table ile Veri Birleştirme

Multi-column keys

R'de data.table ile Veri Birleştirme

Multi-column keys

R'de data.table ile Veri Birleştirme

Specifying multiple keys with merge()

merge(purchases, web_visits, by = c("name", "date"))
merge(purchases, web_visits, 
      by.x = c("name", "date"),  
      by.y = c("person", "date")
R'de data.table ile Veri Birleştirme

Specifying multiple keys with the data.table syntax

purchases[web_visits, on = .(name, date)]
purchases[web_visits, on = c("name", "date")]
purchases[web_visits, on = .(name = person, date)]
purchases[web_visits, on = c("name" = "person", "date")]
R'de data.table ile Veri Birleştirme

Final Slide

R'de data.table ile Veri Birleştirme

Preparing Video For Download...