Scalable Data Processing in R
Simon Urbanek
Member of R-Core, Lead Inventive Scientist, AT&T Labs Research
Repeat 1 to 3 until all data is processed
In the iotools package, the physical loading of data and parsing of input into R objects are separated for better flexibility and performance.
readAsRaw()
reads the entire data into a raw vectorread.chunk()
reads the data in chunks into a raw vectormstrsplit()
converts raw data into a matrixdstrsplit()
converts raw data into a data frameread.delim.raw()
= readAsRaw()
+ dstrsplit()
# Open a file connection
fc <- file("data-file.csv", "rb")
# Read the first line if the data has a header
readLines(fc, n = 1)
....
# Code to import and parse the data
....
# Close the file connection
close(fc)
Scalable Data Processing in R