Welcome to the course

Optimizing R Code with Rcpp

Romain François

Consulting Datactive, ThinkR

R vs C++

R

  • Great, flexible
  • Slow
  • Interpreted

C++

  • Compiled
  • More difficult
  • Fast
Optimizing R Code with Rcpp

Motivation

  • Use Rcpp to make your code faster
  • No need to know all of C++
  • Focus on writing simple C++ functions
Optimizing R Code with Rcpp

Course structure

  • Introduction - basic C++ syntax
  • C++ functions and control flow
  • Vector classes
  • Case studies
Optimizing R Code with Rcpp

Measure performance

loopy version of max

slowmax <- function(x){
   res <- x[1]
   for ( i in 2:length(x) ){
       if( x[i] > res ) res <- x[i]}
   res  }

Comparing performance with the microbenchmark

library(microbenchmark)

x <- rnorm(1e6)
microbenchmark( slowmax(x), max(x) )

 

Unit: milliseconds
       expr       min       lq      mean    median        uq       max neval
 slowmax(x) 31.649452 34.29454 36.344912 35.435299 37.188249 90.363038   100
     max(x)  1.563559  1.74036  1.939367  1.847045  2.014684  3.340052   100
Optimizing R Code with Rcpp
Optimizing R Code with Rcpp

Evaluating simple C++ expressions with evalCpp

library(Rcpp)
evalCpp( "40 + 2" )
42
evalCpp( "exp(1.0)" )
2.718282
evalCpp( "sqrt(4.0)" )
2

Using

std::numeric_limits<int>::max()

to get the biggest representable 32 bit signed integer (int)

evalCpp(
   "std::numeric_limits<int>::max()"
   )
2147483647 

( $\footnotesize \mathtt{2147483647 = 2^{31}-1}$)

Optimizing R Code with Rcpp

Basic number types

C++ has rich set of number types

  • Integer numbers: int
  • Floating point numbers: double
Optimizing R Code with Rcpp

R

# Literal numbers are double
x <- 42
storage.mode(x)
"double"
# Integers need the L suffix
y <- 42L
storage.mode(y)

z <- as.integer(42) storage.mode(z)
"integer"
"integer"

C++

# Suffix .0 forces a double
y <- evalCpp( "42.0" )
storage.mode(y)
"double"
library(Rcpp)
# Literal integers are int
x <- evalCpp( "42" )
storage.mode(x)
"integer"
Optimizing R Code with Rcpp

Casting

Explicit casting with (double)

# Explicit cast
y <- evalCpp("(double)(40 + 2)")
storage.mode(y)
"double"

Beware of the integer division

# Integer division
evalCpp( "13 / 4" )
3

 

# Explicit cast, and hence use 
# of double division
evalCpp( "(double)13 / 4" )
3.25

 

# Automatic conversion in R
13L / 4L
3.25
Optimizing R Code with Rcpp

Let's practice!

Optimizing R Code with Rcpp

Preparing Video For Download...