Joining Data with data.table in R
Scott Ritchie
Postdoctoral Researcher in Systems Genomics
Using the data.table
syntax
parents[children, on = .(name = parent)]
name gender age i.name i.gender i.age
1: Sarah F 41 Oliver M 5
2: Max M 43 Sebastian M 8
3: Qin F 36 Kai-lee F 7
Using the merge()
function
merge(x = children, y = parents, by.x = "parent", by.y = "name")
parent name gender.x age.x gender.y age.y
1: Max Sebastian M 8 M 43
2: Qin Kai-lee F 7 F 36
3: Sarah Oliver M 5 F 41
The suffixes
argument can add useful context:
merge(children, parents, by.x = "parent", by.y = "name",
suffixes = c(".child", ".parent"))
parent name gender.child age.child gender.parent age.parent
1: Max Sebastian M 8 M 43
2: Qin Kai-lee F 7 F 36
3: Sarah Oliver M 5 F 41
Rename all columns using setnames()
setnames(parents, c("parent", "parent.gender", "parent.age"))
setnames(parents, old = c("gender", "age"), new = c("parent.gender", "parent.age"))
parents
parent parent.gender parent.age
1: Sarah F 41
2: Max M 43
3: Qin F 36
Join keys for data.frames
may be in the rownames
parents
gender age
Sarah F 41
Max M 43
Qin F 36
parents <- as.data.table(parents, keep.rownames = "parent")
parents
parent gender age
1: Sarah F 41
2: Max M 43
3: Qin F 36
Joining Data with data.table in R