Set operations

Joining Data with data.table in R

Scott Ritchie

Postdoctoral Researcher in Systems Genomics

Set operation functions

Given two data.tables with the same columns:

  • fintersect(): what rows do these two data.tables share in common?
  • funion(): what is the unique set of rows across these two data.tables?

  • fsetdiff(): what rows are unique to this data.table?

Joining Data with data.table in R

Set operations: `fintersect()`

Extract rows that are present in both data.tables

fintersect(dt1, dt2)

Joining Data with data.table in R

`fintersect()` and duplicate rows

Duplicate rows are ignored by default:

fintersect(dt1, dt2)

Joining Data with data.table in R

`fintersect()` and duplicate rows

all = TRUE: keep the number of copies present in both data.tables:

fintersect(dt1, dt2, all = TRUE)

Joining Data with data.table in R

Set operations: `fsetdiff()`

Extract rows found exclusively in the first data.table

fsetdiff(dt1, dt2)

Joining Data with data.table in R

`fsetdiff()` and duplicates

Duplicate rows are ignored by default:

fsetdiff(dt1, dt2)

Joining Data with data.table in R

`fsetdiff()` and duplicates

all = TRUE: return all extra copies:

fsetdiff(dt1, dt2, all = TRUE)

Joining Data with data.table in R

Set operations: `funion()`

Extract all rows found in either data.table:

funion(dt1, dt2)

Joining Data with data.table in R

`funion()` and duplicates

Duplicate rows are ignored by default:

funion(dt1, dt2)

Joining Data with data.table in R

`funion()` and duplicates

all = TRUE: return all rows:

funion(dt1, dt2, all = TRUE) # rbind()

Joining Data with data.table in R

Removing duplicates when combining many `data.tables`

Two data.tables:

  1. Use funion() to concatenate unique rows

Three or more:

  1. Concatenate all data.tables using rbind() or rbindlist()
  2. Identify and remove duplicates using duplicated() and unique()
Joining Data with data.table in R

Let's practice!

Joining Data with data.table in R

Preparing Video For Download...