Scalable Data Processing in R
Michael Kane
Assistant Professor, Yale University
bigmemory
is used to store, manipulate, and process big matrices, that may be larger than a computer's RAM
R objects are kept in RAM
When you run out of RAM
You are better off moving data to RAM only when the data are needed for processing.
bigmemory implements the big.matrix
data type, which is used to create, store, access, and manipulate matrices stored on the disk
Data are kept on the disk and moved to RAM implicitly
A big.matrix
object:
library(bigmemory)
# Create a new big.matrix object x <- big.matrix(nrow = 1, ncol = 3, type = "double", init = 0, backingfile = "hello_big_matrix.bin", descriptorfile = "hello_big_matrix.desc")
# See what's in it
x[,]
0 0 0
x
An object of class "big.matrix"
Slot "address":
<pointer: 0x108e2a9a0>
# Change the value in the first row and column
x[1, 1] <- 3
# Verify the change has been made
x[,]
3 0 0
Scalable Data Processing in R