References vs. Copies

Scalable Data Processing in R

Simon Urbanek

Member of R-Core, Lead Inventive Scientist, AT&T Labs Research

Big matrices and matrices - Similarities

  • Subset
  • Assign
Scalable Data Processing in R

Big matrices and matrices - Differences

  • big.matrix is stored on the disk
  • Persists across R sessions
  • Can be shared across R sessions
Scalable Data Processing in R

R usually makes copies during assignment

This creates a copy of a and assigns it to b.

a <- 42
b <- a
a
42
b
42
a <- 43
a
43
b
42
Scalable Data Processing in R

R usually makes copies during assignment

a <- 42

foo <- function(a){a <- 43 paste("Inside the function a is", a)}
foo(a)
"Inside the function a is 43"
paste("Outside the function a is still", a)
"Outside the function a is still 42"
Scalable Data Processing in R

Not all R objects are copied

This function does change the value of a in the global environment

foo <- function(a) {a$val <- 43 
                    paste("Inside the function a is", a$val)}
a <- environment()
a$val <- 42
foo(a)
"Inside the function a is 43"
paste("Outside the function a$val is", a$val)
"Outside the function a$val is 43"
Scalable Data Processing in R

deepcopy()

# x is a big matrix
x <- big.matrix(...)

# x_no_copy and x refer to the same object
x_no_copy <- x

# x_copy and x refer to different objects
x_copy <- deepcopy(x)

Scalable Data Processing in R

Reference behaviour

R won't make copies implicitly

  • Minimize memory usage
  • Reduce execution time
Scalable Data Processing in R

Not all R objects are copied

library(bigmemory)

x <- big.matrix(nrow = 1, ncol = 3, type = "double", 
                init = 0, 
                backingfile = "hello-bigmemory.bin", 
                descriptorfile = "hello-bigmemory.desc")
Scalable Data Processing in R

Not all R objects are copied

x_no_copy <- x
x[,]
0 0 0
x_no_copy[,]
0 0 0
x[,] <- 1
x[,]
1 1 1
x_no_copy[,]
1 1 1
Scalable Data Processing in R

Not all R objects are copied

x_copy <- deepcopy(x)
x[,]
1 1 1
x_copy[,]
1 1 1
x[,] <- 2
x[,]
2 2 2
x_copy[,]
1 1 1
Scalable Data Processing in R

Let's practice!

Scalable Data Processing in R

Preparing Video For Download...