Advanced file reading

R'de data.table ile Veri İşleme

Matt Dowle, Arun Srinivasan

Instructors, DataCamp

Reading big integers using integer64 type

  • By default, R can only represent numbers less than or equal to 2^31 - 1 = 2147483647
  • Large integers are automatically read in as integer64 type, provided by the bit64 package
ans <- fread("id,name\n1234567890123,Jane\n5284782381811,John\n")
ans
           id name
1234567890123 Jane
5284782381811 John
class(ans$id)
"integer64"
R'de data.table ile Veri İşleme

Specifying column class types with colClasses

str <- "x1,x2,x3,x4,x5\n1,2,1.5,true,cc\n3,4,2.5,false,ff"

ans <- fread(str, colClasses = c(x5 = "factor")) str(ans)
Classes ‘data.table’ and 'data.frame':    2 obs. of  5 variables:
 $ x1: int  1 3
 $ x2: int  2 4
 $ x3: num  1.5 2.5
 $ x4: logi  TRUE FALSE
 $ x5: Factor w/ 2 levels "cc","ff": 1 2
R'de data.table ile Veri İşleme

Specifying column class types with colClasses

ans <- fread(str, colClasses = c("integer", "integer", 
                                 "numeric", "logical", "factor"))
str(ans)
Classes ‘data.table’ and 'data.frame':    2 obs. of  5 variables:
 $ x1: int  1 3
 $ x2: int  2 4
 $ x3: num  1.5 2.5
 $ x4: logi  TRUE FALSE
 $ x5: Factor w/ 2 levels "cc","ff": 1 2
R'de data.table ile Veri İşleme

Specifying column class types with colClasses

str <- "x1,x2,x3,x4,x5,x6\n1,2,1.5,2.5,aa,bb\n3,4,5.5,6.5,cc,dd"
ans <- fread(str, colClasses = list(numeric = 1:4, factor = c("x5", "x6")))
str(ans)
Classes ‘data.table’ and 'data.frame': 2 obs. of 6 variables:
 $ x1: num  1 3
 $ x2: num  2 4
 $ x3: num  1.5 5.5
 $ x4: num  2.5 6.5
 $ x5: Factor w/ 2 levels "aa","cc": 1 2
 $ x6: Factor w/ 2 levels "bb","dd": 1 2
R'de data.table ile Veri İşleme

The fill argument

str <- "1,2\n3,4,a\n5,6\n7,8,b"
fread(str) 
V1 5 6
 7 8 b
Warning message:
In fread(str) :
  Detected 2 column names but the data has 3 columns (i.e. invalid file). 
  Added 1 extra default column name for the first column which is guessed to 
  be row names or an index. 
  Use setnames() afterwards if this guess is not correct, 
  or fix the file write command that created the file to create a valid file.
R'de data.table ile Veri İşleme

The fill argument

fread(str, fill = TRUE)
V1 V2 V3
 1  2
 3  4  a
 5  6
 7  8  b
R'de data.table ile Veri İşleme

The na.strings argument

Missing values are commonly encoded as: "999" or "##NA" or "N/A"

str <- "x,y,z\n1,###,3\n2,4,###\n#N/A,7,9"
ans <- fread(str, na.strings = c("###", "#N/A"))
ans
x  y  z
1 NA  3
2  4 NA
NA 7  9
R'de data.table ile Veri İşleme

Let's practice!

R'de data.table ile Veri İşleme

Preparing Video For Download...