Welcome to the course

Joining Data with data.table in R

Scott Ritchie

Postdoctoral Researcher in Systems Genomics

Joining data.tables

  • Combine information from two data.tables into a single data.table

Joining Data with data.table in R

Course overview

  • Chapter 1: Joining data with merge()

  • Chapter 2: Joins in the data.table workflow

  • Chapter 3: Troubleshooting joins

  • Chapter 4: Concatenating and reshaping data.tables

Joining Data with data.table in R

Table keys

Columns that link information across two tables

library(data.table)

demographics <- data.table(name = c("Trey", "Matthew", "Angela"), gender = c(NA, "M", "F"), age = c(54, 43, 39)) shipping <- data.table(name = c("Matthew", "Trey", "Angela"), address = c("7 Mill road", "12 High street", "33 Pacific boulevard"))

Joining Data with data.table in R

Inspecting `data.tables` in your R session

The tables() function will show you all data.tables loaded in your R session

tables()
           NAME NROW NCOL MB            COLS KEY
1: demographics    3    3  0 name,gender,age    
2:     shipping    3    2  0    name,address    
Total: 0MB
Joining Data with data.table in R

Inspecting `data.tables` in your R session

The str() will show you the type of each column in a single data.table

str(demographics)
Classes ‘data.table’ and 'data.frame':    3 obs. of  3 variables:
 $ name  : chr  "Trey" "Matthew" "Angela"
 $ gender: chr  NA "M" "F"
 $ age   : num  54 43 39
 - attr(*, ".internal.selfref")=<externalptr> 
Joining Data with data.table in R

Inspecting `data.tables` in your R session

demographics_all
         name sex age
  1:     Trey  NA  54
  2:  Matthew   M  43
  3:   Angela   F  39
  4: Michelle   F  63
  5:  Mohamed   M  26
 ---                 
102:  Patrick   M  27
103:      Wei   F  41
104:     Adam   M  33
105:  Somchai   M  53
106:     Alma   F  19
Joining Data with data.table in R

Let's practice!

Joining Data with data.table in R

Preparing Video For Download...