Fast data reading with fread()

Data Manipulation with data.table in R

Matt Dowle, Arun Srinivasan

Instructors, DataCamp

Blazing FAST!

  • Fast and parallel file reader
  • Argument nThread controls the number of threads to use
Data Manipulation with data.table in R

User-friendly

  • Can import local files, files from the web, and strings
  • Intelligent defaults - colClasses, sep, nrows etc.
  • Note: Dates and Datetimes are read as character columns but can be converted later with the excellent fasttime or anytime packages
Data Manipulation with data.table in R

Fast and friendly file reader

# File from URL
DT1<-fread("https://bit.ly/2RkBXhV")
DT1
a b
1 2
3 4
# Local file
DT2 <- fread("data.csv")
DT2
a b
1 2
3 4
# String
DT3 <- fread("a,b\n1,2\n3,4")
DT3
a b
1 2
3 4
# String without col names
DT4 <- fread("1,2\n3,4")
DT4
V1 V2
1  2
3  4
Data Manipulation with data.table in R

nrows and skip arguments

# Read only first line (after header)
fread("a,b\n1,2\n3,4", nrows = 1)
a b
1 2
# Skip first two lines containing metadata
str <- "# Metadata\nTimestamp: 2018-05-01 19:44:28 GMT\na,b\n1,2\n3,4"
fread(str, skip = 2)
a b
1 2
3 4
Data Manipulation with data.table in R

More on nrows and skip arguments

str <- "# Metadata\nTimestamp: 2018-05-01 19:44:28 GMT\na,b\n1,2\n3,4"
fread(str, skip = "a,b")
a b
1 2
3 4
fread(str, skip = "a,b", nrows = 1)
a b
1 2
Data Manipulation with data.table in R

select and drop arguments

str <- "a,b,c\n1,2,x\n3,4,y"
fread(str, select = c("a", "c"))

# Same as
fread(str, drop = "b")
a c
1 x
3 y
str <- "1,2,x\n3,4,y"
fread(str, select = c(1, 3))

# Same as 
fread(str, drop = 2)
V1 V3
 1  x
 3  y
Data Manipulation with data.table in R

Let's practice!

Data Manipulation with data.table in R

Preparing Video For Download...