Parallel computing

Understanding Data Engineering

Hadrien Lacroix

Content Developer at DataCamp

Parallel computing

  • Basis of modern data processing tools
  • Necessary:
    • Mainly because of memory
    • Also for processing power
  • How it works:
    • Split tasks up into several smaller subtasks
    • Distribute these subtasks over several computers
Understanding Data Engineering

1000 t shirts

Understanding Data Engineering

senior sales assistant

Understanding Data Engineering

junior sales assistant

1 Emojis by Mohamed Hassan
Understanding Data Engineering

one sales assistant at at time

Understanding Data Engineering

batching t-shirts

Understanding Data Engineering

junior sales assistants finishing in one hour and fifteen minutes

Understanding Data Engineering

senior sales assistants finishing in two hours and thirteen minutes

Understanding Data Engineering

Benefits and risks of parallel computing

  • Employees = processing units
  • Advantages
    • Extra processing power
    • Reduced memory footprint
  • Disadvantages
    • Moving data incurs a cost
    • Communication time
Understanding Data Engineering

comparing junior and senior sales assistant performance

Understanding Data Engineering

it takes ten minutes to distribute the one thousand t-shirts to the four junior assistants

Understanding Data Engineering

it takes five minutes to gather the t-shirts from the four junior assistants into one pile

Understanding Data Engineering

data pipeline

Understanding Data Engineering

data pipeline

Understanding Data Engineering

Summary

  • Benefits and risks
  • How it's implemented at Spotflix
Understanding Data Engineering

Let's practice!

Understanding Data Engineering

Preparing Video For Download...