Parallel computing

Capire il Data Engineering

Hadrien Lacroix

Content Developer at DataCamp

Parallel computing

  • Basis of modern data processing tools
  • Necessary:
    • Mainly because of memory
    • Also for processing power
  • How it works:
    • Split tasks up into several smaller subtasks
    • Distribute these subtasks over several computers
Capire il Data Engineering

1000 t shirts

Capire il Data Engineering

senior sales assistant

Capire il Data Engineering

junior sales assistant

1 Emojis by Mohamed Hassan
Capire il Data Engineering

one sales assistant at at time

Capire il Data Engineering

batching t-shirts

Capire il Data Engineering

junior sales assistants finishing in one hour and fifteen minutes

Capire il Data Engineering

senior sales assistants finishing in two hours and thirteen minutes

Capire il Data Engineering

Benefits and risks of parallel computing

  • Employees = processing units
  • Advantages
    • Extra processing power
    • Reduced memory footprint
  • Disadvantages
    • Moving data incurs a cost
    • Communication time
Capire il Data Engineering

comparing junior and senior sales assistant performance

Capire il Data Engineering

it takes ten minutes to distribute the one thousand t-shirts to the four junior assistants

Capire il Data Engineering

it takes five minutes to gather the t-shirts from the four junior assistants into one pile

Capire il Data Engineering

data pipeline

Capire il Data Engineering

data pipeline

Capire il Data Engineering

Summary

  • Benefits and risks
  • How it's implemented at Spotflix
Capire il Data Engineering

Let's practice!

Capire il Data Engineering

Preparing Video For Download...