Horizontally scaling streaming systems
Streaming Concepts
Mike Metzger
Data Engineer
Horizontal scaling refresher
- Instead of scaling "up", scale "out"
- Typically means adding processing capability by adding more, rather than faster / better
- Works best with embarrassingly parallel situations
- Tasks that can be split easily
- E.g. processing a large group of non-interdependent images
Horizontal scaling with streaming
- Streaming data processing typically has minimal delays
- Can make transfer of data between workers tricky
- Best to process a full stream within a single pipeline
- Create copies of the pipelines
Pipeline copies
- As events occur, they initially enter a pipeline
- All tasks related to that process are self-contained within the pipeline, until completion
- Scale by adding more pipelines
- Can still vertically scale within a pipeline
Additional considerations
- Other components may be required
- Load balancer / director
- Card dealer
- Least busy node
- Eventually hit bottlenecks
- Consider shortening streaming pipeline
- Remove need to immediately process data
Let's practice!
Streaming Concepts
Preparing Video For Download...