Congratulations!

Elaborazione scalabile dei dati in R

Michael J. Kane and Simon Urbanek

Instructors, DataCamp

Split-Apply-Combine

  • Break the data into parts
  • Compute on the parts
  • Combine the results
Elaborazione scalabile dei dati in R

Split-Apply-Combine: Advantages

  • Manageable parts don't overwhelm your computer
  • Approach is easy to parallelize
  • Process sequentially
  • Process on serveral machines in a cluster
Elaborazione scalabile dei dati in R

Split-Apply-Combine: R

  • split() partitions set of row numbers or data.frame

  • Map() computes on parts

  • Reduce() combines results

Elaborazione scalabile dei dati in R

bigmemory

bigmemory

  • Good for larger data sets that can be represented as dense matrices and might be too big for RAM
  • Looks like a regular R matrix
Elaborazione scalabile dei dati in R

iotools

iotools

  • Good for much larger data that can be processed in sequential chunks
  • Supports data.frame and matrix
Elaborazione scalabile dei dati in R

Elaborazione scalabile dei dati in R

Good luck!

Elaborazione scalabile dei dati in R

Preparing Video For Download...