Advanced file reading

Manipulasi Data dengan data.table di R

Matt Dowle, Arun Srinivasan

Instructors, DataCamp

Reading big integers using integer64 type

  • By default, R can only represent numbers less than or equal to 2^31 - 1 = 2147483647
  • Large integers are automatically read in as integer64 type, provided by the bit64 package
ans <- fread("id,name\n1234567890123,Jane\n5284782381811,John\n")
ans
           id name
1234567890123 Jane
5284782381811 John
class(ans$id)
"integer64"
Manipulasi Data dengan data.table di R

Specifying column class types with colClasses

str <- "x1,x2,x3,x4,x5\n1,2,1.5,true,cc\n3,4,2.5,false,ff"

ans <- fread(str, colClasses = c(x5 = "factor")) str(ans)
Classes ‘data.table’ and 'data.frame':    2 obs. of  5 variables:
 $ x1: int  1 3
 $ x2: int  2 4
 $ x3: num  1.5 2.5
 $ x4: logi  TRUE FALSE
 $ x5: Factor w/ 2 levels "cc","ff": 1 2
Manipulasi Data dengan data.table di R

Specifying column class types with colClasses

ans <- fread(str, colClasses = c("integer", "integer", 
                                 "numeric", "logical", "factor"))
str(ans)
Classes ‘data.table’ and 'data.frame':    2 obs. of  5 variables:
 $ x1: int  1 3
 $ x2: int  2 4
 $ x3: num  1.5 2.5
 $ x4: logi  TRUE FALSE
 $ x5: Factor w/ 2 levels "cc","ff": 1 2
Manipulasi Data dengan data.table di R

Specifying column class types with colClasses

str <- "x1,x2,x3,x4,x5,x6\n1,2,1.5,2.5,aa,bb\n3,4,5.5,6.5,cc,dd"
ans <- fread(str, colClasses = list(numeric = 1:4, factor = c("x5", "x6")))
str(ans)
Classes ‘data.table’ and 'data.frame': 2 obs. of 6 variables:
 $ x1: num  1 3
 $ x2: num  2 4
 $ x3: num  1.5 5.5
 $ x4: num  2.5 6.5
 $ x5: Factor w/ 2 levels "aa","cc": 1 2
 $ x6: Factor w/ 2 levels "bb","dd": 1 2
Manipulasi Data dengan data.table di R

The fill argument

str <- "1,2\n3,4,a\n5,6\n7,8,b"
fread(str) 
V1 5 6
 7 8 b
Warning message:
In fread(str) :
  Detected 2 column names but the data has 3 columns (i.e. invalid file). 
  Added 1 extra default column name for the first column which is guessed to 
  be row names or an index. 
  Use setnames() afterwards if this guess is not correct, 
  or fix the file write command that created the file to create a valid file.
Manipulasi Data dengan data.table di R

The fill argument

fread(str, fill = TRUE)
V1 V2 V3
 1  2
 3  4  a
 5  6
 7  8  b
Manipulasi Data dengan data.table di R

The na.strings argument

Missing values are commonly encoded as: "999" or "##NA" or "N/A"

str <- "x,y,z\n1,###,3\n2,4,###\n#N/A,7,9"
ans <- fread(str, na.strings = c("###", "#N/A"))
ans
x  y  z
1 NA  3
2  4 NA
NA 7  9
Manipulasi Data dengan data.table di R

Let's practice!

Manipulasi Data dengan data.table di R

Preparing Video For Download...