The importance of vectorizing your code

Writing Efficient R Code

Colin Gillespie

Jumping Rivers & Newcastle University

General rule

  • Calling an R function eventually leads to C or FORTRAN code
    • This code is very heavily optimized

Goal

  • Access the underlying C or FORTRAN code as quickly as possible; the fewer functions call the better.
Writing Efficient R Code

Vectorized functions

  • Many R functions are vectorized
    • Single number but return a vector
rnorm(4)
-0.7247  0.2502  0.3510  0.6919
  • Vector as input
mean(c(36, 48))
42
Writing Efficient R Code

Generating random numbers

library(microbenchmark)
n <- 1e6
x <- vector("numeric", n)
microbenchmark(
    x <- rnorm(n),
    {
        for(i in seq_along(x))
            x[i] <- rnorm(1)
        },
    times = 10
)
# Unit: milliseconds
# expr        lq mean    uq  cld
# rnorm(n)    60   70    80  a
# Looping   2600 2700  2800   b

## Output trimmed for presentation

Compare

x <- vector("numeric", n)
for(i in seq_along(x))
    x[i] <- rnorm(1)

to

x <- rnorm(n)
Writing Efficient R Code

Why is the loop slow?

Looping
x <- vector("numeric", n)
for(i in seq_along(x))
    x[i] <- rnorm(1)
Allocation
x <- vector("numeric", n)
  • Loop: One-off cost
  • Vectorized: Comparable
Generation
  • Loop: one million calls to rnorm()
  • Vectorized: a single call to rnorm()
Assignment
  • Loop: One million calls to the assignment method
  • Vectorized: a single assignment
Writing Efficient R Code

R club

The second rule of R club: use a vectorized solution wherever possible.

Writing Efficient R Code

Let's practice!

Writing Efficient R Code

Preparing Video For Download...