The merge function

Joining Data with data.table in R

Scott Ritchie

Postdoctoral Researcher in Systems Genomics

Joins

  • Concept of joins come from database query languages (e.g. SQL).

  • Four standard joins:

    • inner
    • full
    • left
    • right
  • All four can be done using merge()

Joining Data with data.table in R

Inner join

Only keep observations that have information in both data.tables

merge(x = demographics, y = shipping, 
      by.x = "name", by.y = "name")

Joining Data with data.table in R

The by argument

Use by to avoid repeated typing of the same column name

merge(x = demographics, y = shipping, 
      by = "name")

Joining Data with data.table in R

Full join

Keep all observations that are in either data.table

merge(x = demographics, y = shipping, 
      by = "name", all = TRUE)

Joining Data with data.table in R

Let's practice!

Joining Data with data.table in R

Preparing Video For Download...