The importance of vectorizing your code

Writing Efficient R Code

Colin Gillespie

Jumping Rivers & Newcastle University

General rule

Calling an R function eventually leads to C or FORTRAN code
- This code is very heavily optimized

Goal

Access the underlying C or FORTRAN code as quickly as possible; the fewer functions call the better.

Vectorized functions

Many R functions are vectorized
- Single number but return a vector

rnorm(4)

-0.7247  0.2502  0.3510  0.6919

Vector as input

mean(c(36, 48))

Generating random numbers

library(microbenchmark)
n <- 1e6
x <- vector("numeric", n)
microbenchmark(
    x <- rnorm(n),
    {
        for(i in seq_along(x))
            x[i] <- rnorm(1)
        },
    times = 10
)

# Unit: milliseconds
# expr        lq mean    uq  cld
# rnorm(n)    60   70    80  a
# Looping   2600 2700  2800   b

## Output trimmed for presentation

Compare

x <- vector("numeric", n)
for(i in seq_along(x))
    x[i] <- rnorm(1)

x <- rnorm(n)

Why is the loop slow?

Looping

x <- vector("numeric", n)
for(i in seq_along(x))
    x[i] <- rnorm(1)

Allocation

x <- vector("numeric", n)

Loop: One-off cost
Vectorized: Comparable

Generation

Loop: one million calls to rnorm()
Vectorized: a single call to rnorm()

Assignment

Loop: One million calls to the assignment method
Vectorized: a single assignment

R club

The second rule of R club: use a vectorized solution wherever possible.

Let's practice!

Writing Efficient R Code