How do I find the bottleneck?

Writing Efficient R Code

Colin Gillespie

Jumping Rivers & Newcastle University

Writing Efficient R Code

Code profiling

The general idea is to:

  • Run the code
  • Every few milliseconds, record what is being currently executed
  • Rprof()comes with R and does exactly this
    • Tricky to use
  • Use profvis instead
Writing Efficient R Code

IMDB data set

  • From the ggplot2movies package
data(movies, package = "ggplot2movies")
dim(movies)
58788    24
  • Data frame: around 60,000 rows and 24 columns
  • Each row corresponds to a particular movie
Writing Efficient R Code

Braveheart

braveheart = movies[7288,]
Year Length Rating
1995 177 8.3

Writing Efficient R Code

Example: Braveheart

# Load data
data(movies,
     package = "ggplot2movies")
braveheart <- movies[7288,]
movies <- movies[movies$Action==1,]

plot(movies$year, movies$rating, xlab = "Year", ylab = "Rating")
# local regression line model <- loess(rating ~ year, data = movies) j <- order(movies$year) lines(movies$year[j], model$fitted[j], col = "forestgreen")
points(braveheart$year, braveheart$rating, pch = 21, bg = "steelblue")

Writing Efficient R Code

Profvis

  • RStudio has integrated support for profiling with profvis
    • Highlight the code you want to profile
    • Profile -> Profile Selected lines

Writing Efficient R Code

Command line

library("profvis")

profvis({
data(movies, package = "ggplot2movies") # Load data braveheart <- movies[7288,] movies <- movies[movies$Action == 1,] plot(movies$year, movies$rating, xlab = "Year", ylab="Rating") model <- loess(rating ~ year, data = movies) # loess regression line j <- order(movies$year) lines(movies$year[j], model$fitted[j], col="forestgreen", lwd=2) points(braveheart$year, braveheart$rating, pch = 21, bg = "steelblue", cex = 3)
})

Which line do you think will be the slowest?

Writing Efficient R Code

Writing Efficient R Code

Writing Efficient R Code

Let's practice!

Writing Efficient R Code

Preparing Video For Download...