My code is slow!

Writing Efficient R Code

Colin Gillespie

Jumping Rivers & Newcastle University

Is my code really slow?

  • 1 second?
  • 1 minute?
  • 1 hour?
Writing Efficient R Code

Is my code really slow?

Writing Efficient R Code

Benchmarking

  1. We construct a function around the feature we wish to benchmark
  2. We time the function under different scenarios, e.g., data set
Writing Efficient R Code

Example: Sequence of numbers

$$ 1, 2, 3, \ldots, n $$

Option 1
1:n
Option 2
seq(1, n)
Option 3
seq(1, n, by = 1)
Writing Efficient R Code

Function wrapping

colon <- function(n) 1:n
colon(5)
1 2 3 4 5
seq_default <- function(n) seq(1, n)
seq_by <- function(n) seq(1, n, by = 1)
Writing Efficient R Code

Timing with system.time()

system.time(colon(1e8))
#   user  system elapsed
#  0.032   0.028   0.060
system.time(seq_default(1e8))
#   user  system elapsed
#  0.060   0.028   0.086
system.time(seq_by(1e8))
#   user  system elapsed
#  1.088   0.520   1.600
  • user time is the CPU time charged for the execution of user instructions.
    • system time is the CPU time charged for execution by the system on behalf of the calling process.
    • elapsed time is approximately the sum of user and system, this is the number we typically care about.
Writing Efficient R Code

Storing the result

The trouble with

system.time(colon(1e8))

is we haven't stored the result. We need to rerun to code store the result

res <- colon(1e8)

The <- operator performs both:

  • Argument passing
  • Object assignment
system.time(res <- colon(1e8))

The = operator performs one of:

  • Argument passing
  • object assignment
# Raises an error
system.time(res = colon(1e8))
Writing Efficient R Code

Relative time

Method Absolute time (secs) Relative time
colon(n) 0.060 $0.060/0.060 = 1.00$
seq_default(n) 0.086 $0.086/0.060 = 1.40$
seq_by(n) 1.607 $1.60/0.060 = 26.7$
Writing Efficient R Code

Microbenchmark package

  • Compares functions
    • Each function is run multiple times
library("microbenchmark")

n <- 1e8 microbenchmark(colon(n), seq_default(n), seq_by(n), times = 10) # Run each function 10 times
# Unit: milliseconds
#           expr  min   lq  mean  median   uq  max neval cld
#       colon(n)   59  130   220     202  341  391    10  a
# seq_default(n)   94  204   290     337  348  383    10  a
#      seq_by(n) 1945 2044  2260    2275 2359 2787    10   b
Writing Efficient R Code

Let's practice!

Writing Efficient R Code

Preparing Video For Download...