The parallel package - parApply

Writing Efficient R Code

Colin Gillespie

Jumping Rivers & Newcastle University

The parallel package

  • Part of R since 2011
    library("parallel")
    
    • Cross platform: Code works under Windows, Linux, Mac
    • Has parallel versions of standard functions
Writing Efficient R Code

The apply() function

  • apply()is similar to a for loop

    • We apply a function to each row/column of a matrix
  • A 10 column, 10,000 row matrix:

m <- matrix(rnorm(100000), ncol = 10)
  • apply is neater than a for loop
res <- apply(m, 1, median)
Writing Efficient R Code

Converting to parallel

  • Load the package
    • Specify the number of cores
    • Create a cluster object
    • Swap to parApply()
    • Stop!
library("parallel")
copies_of_r <- 7
cl <- makeCluster(copies_of_r)
parApply(cl, m, 1, median)
stopCluster(cl)
Writing Efficient R Code

The bad news

As Lewis Caroll said

The hurrier I go, the behinder I get.

  • Sometimes running in parallel is slower due to thread communication
# Serial version
apply(m, 1, median)
# Parallel version
parApply(cl, m, 1, median)
  • Benchmark both solutions
Writing Efficient R Code

Let's practice!

Writing Efficient R Code

Preparing Video For Download...