Welcome to the course!

Data Manipulation with data.table in R

Matt Dowle and Arun Srinivasan

Instructors, DataCamp

What is a data.table?

  • Enhanced data.frame
    • Inherits from and extends data.frame
  • Columnar data structure
  • Every column must be of same length but can be of different type
Data Manipulation with data.table in R

Why use data.table?

  • Concise and consistent syntax
    • Think in terms of rows, columns and groups
    • Provides a placeholder for each
# General form of data.table syntax
DT[i, j, by]
   |  |  |
   |  |  --> grouped by what?
   |  -----> what to do?
   --------> on which rows?
Data Manipulation with data.table in R

benchmark

Data Manipulation with data.table in R

Why use data.table?

Data Manipulation with data.table in R

Creating a data.table

Three ways of creating data tables:

  • data.table()
  • as.data.table()
  • fread()
Data Manipulation with data.table in R

Creating a data.table

library(data.table)
x_df <- data.frame(id = 1:2, name = c("a", "b"))
x_df
id name
 1    a
 2    b
x_dt <- data.table(id = 1:2, name = c("a", "b"))
x_dt
id name
 1    a
 2    b
Data Manipulation with data.table in R

Creating a data.table

y <- list(id = 1:2, name = c("a", "b"))
y
$id
1 2
$name
"a" "b"
x <- as.data.table(y)
x
id name
 1    a
 2    b
Data Manipulation with data.table in R

data.tables and data.frames (I)

Since a data.table is a data.frame ...

x <- data.table(id = 1:2, 
                name = c("a", "b"))
x
id name
 1    a
 2    b
class(x)
"data.table" "data.frame"
Data Manipulation with data.table in R

data.tables and data.frames (II)

Functions used to query data.frames also work on data.tables

nrow(x)
2
ncol(x)
2
dim(x)
2 2
Data Manipulation with data.table in R

data.tables and data.frames (III)

A data table never automatically converts character columns to factors

x_df <- data.frame(id = 1:2, name = c("a", "b"))
class(x_df$name)
"factor"
x_dt <- data.table(id = 1:2, name = c("a", "b"))
class(x_dt$name)
"character"
Data Manipulation with data.table in R

data.tables and data.frames (IV)

Never sets, needs or uses row names

rownames(x_dt) <- c("R1", "R2")
x_dt
   id name
1:  1    a
2:  2    b
Data Manipulation with data.table in R

Let's practice!

Data Manipulation with data.table in R

Preparing Video For Download...