Horizontally scaling streaming systems

Streaming Concepts

Mike Metzger

Data Engineer

Horizontal scaling refresher

  • Instead of scaling "up", scale "out"
  • Typically means adding processing capability by adding more, rather than faster / better
  • Works best with embarrassingly parallel situations
    • Tasks that can be split easily
    • E.g. processing a large group of non-interdependent images
Streaming Concepts

Horizontal scaling with streaming

  • Streaming data processing typically has minimal delays
  • Can make transfer of data between workers tricky
  • Best to process a full stream within a single pipeline
  • Create copies of the pipelines
Streaming Concepts

Pipeline copies

  • As events occur, they initially enter a pipeline
  • All tasks related to that process are self-contained within the pipeline, until completion
  • Scale by adding more pipelines
  • Can still vertically scale within a pipeline
Streaming Concepts

Additional considerations

  • Other components may be required
  • Load balancer / director
    • Card dealer
    • Least busy node
  • Eventually hit bottlenecks
    • Disk write performance
  • Consider shortening streaming pipeline
    • Remove need to immediately process data
Streaming Concepts

Let's practice!

Streaming Concepts

Preparing Video For Download...