Writing Efficient R Code
Colin Gillespie
Jumping Rivers & Newcastle University
There are parallel versions of
apply()
- parApply()
sapply()
- parSapply()
lapply()
- parLapply()
sapply() is just another way of writing a for loop
The loop
for(i in 1:10)
x[i] <- simulate(i)
Can be written as
sapply(1:10, simulate)
We are applying a function to each value of a vector
It's the same recipe!
parSapply()
plot(pokemon$Defense, pokemon$Attack)
abline(lm(pokemon$Attack ~ pokemon$Defense), col = 2)
cor(pokemon$Attack, pokemon$Defense)
0.437
In a perfect world, we would resample from the population; but we can't
Instead, we assume the original sample is representative of the population
bootstrap <- function(data_set) {
# Sample with replacement
s <- sample(1:nrow(data_set), replace = TRUE)
new_data <- data_set[s,]
# Calculate the correlation
cor(new_data$Attack, new_data$Defense)
}
# 100 independent bootstrap simulations
sapply(1:100, function(i) bootstrap(pokemon))
parSapply()
library("parallel")
no_of_cores <- 7
cl <- makeCluster(no_of_cores)
clusterExport(cl,
c("bootstrap", "pokemon"))
parSapply(cl, 1:100,
function(i) bootstrap(pokemon))
stopCluster(cl)
Writing Efficient R Code