Joining Data with data.table in R
Scott Ritchie
Postdoctoral Researcher in Systems Genomics
Concatenating data.tables
rbind()
: concatenate rows from data.tables
stored in different variables
rbindlist()
: concatenate rows from a list
of data.tables
Concatenate two or more data.tables
stored as variables
# ... takes any number of arguments
rbind(...)
rbind(sales_2015, sales_2016)
quarter amount
1: 1 3200100
2: 2 2950000
3: 3 2980700
4: 4 3420000
5: 1 3350000
6: 2 3000300
7: 3 3120200
8: 4 3670000
The idcol
argument adds a column indicating the data.table
of origin
rbind("2015" = sales_2015, "2016" = sales_2016, idcol = "year")
year quarter amount
1: 2015 1 3200100
2: 2015 2 2950000
3: 2015 3 2980700
4: 2015 4 3420000
5: 2016 1 3350000
6: 2016 2 3000300
7: 2016 3 3120200
8: 2016 4 3670000
rbind(sales_2015, sales_2016, idcol = "year")
year quarter amount
1: 1 1 3200100
2: 1 2 2950000
3: 1 3 2980700
4: 1 4 3420000
5: 2 1 3350000
6: 2 2 3000300
7: 2 3 3120200
8: 2 4 3670000
rbind(sales_2015, sales_2016, idcol = TRUE)
.id quarter amount
1: 1 1 3200100
2: 1 2 2950000
3: 1 3 2980700
4: 1 4 3420000
5: 2 1 3350000
6: 2 2 3000300
7: 2 3 3120200
8: 2 4 3670000
rbind("2015" = sales_2015, "2016" = sales_2016, idcol = "year",
fill = TRUE)
rbind(sales_2015, sales_2016, idcol = "year")
Error in rbindlist(l, use.names, fill, idcol) :
Item 2 has 3 columns, inconsistent with item 1 which has 2 columns.
If instead you need to fill missing columns, use set argument 'fill'
to TRUE.
Concatenate rows from a list
of data.tables
# Read in a list of data.tables
table_files <- c("sales_2015.csv", "sales_2016.csv")
list_of_tables <- lapply(table_files, fread)
rbindlist(list_of_tables)
quarter amount
1: 1 3200100
2: 2 2950000
3: 3 2980700
4: 4 3420000
5: 1 3350000
6: 2 3000300
7: 3 3120200
8: 4 3670000
The idcol
argument takes names from the input list
names(list_of_tables) <- c("2015", "2016")
rbindlist(list_of_tables, idcol = "year")
year quarter amount
1: 2015 1 3200100
2: 2015 2 2950000
3: 2015 3 2980700
4: 2015 4 3420000
5: 2016 1 3350000
6: 2016 2 3000300
7: 2016 3 3120200
8: 2016 4 3670000
rbind("2015" = sales_2015, "2016" = sales_2016, idcol = "year",
use.names = TRUE)
rbind("2015" = sales_2015, "2016" = sales_2016, idcol = "year",
use.names = FALSE)
rbind("2015" = sales_2015, "2016" = sales_2016, idcol = "year",
use.names = FALSE)
rbind()
is use.names = TRUE
rbindlist()
is use.names = FALSE
unless fill = TRUE
.Joining Data with data.table in R