Streaming roadblocks

Streaming Concepts

Mike Metzger

Data Engineer

Scaling review

Vertical scaling - compute resources

  • CPU
  • RAM
  • Disk (capacity and IO)
  • Network

Horizontal scaling - more nodes

  • Add machines as nodes / workers
Streaming Concepts

Initial concerns

  • Compute resources
    • Lack of adequate or slow resources
  • More nodes
    • Requires more connectivity
    • Some form of shared resources
    • Added complexity
    • Usually some form of cluster management
Streaming Concepts

Communication issues

Types of messaging problems:

  • Missing messages
  • Delayed messages
  • Out of order messages
  • Repeat messages
Streaming Concepts

Missing messages

  • Represent events that never appear
  • Can be difficult to detect
  • Sometimes handled with a sequence identifier
  • Requesting the missing messages can delay further responses
Streaming Concepts

Delayed messages

  • Similar to missing messages
  • May cause issues with the processing pipeline due to delays
  • Often related to system resource issues
Streaming Concepts

Out of order messages

  • Combination of missing / delayed messages
  • Results when an older message appears after newer ones
  • Requires some measure of sequence or state to detect
  • Handling these issues depends on the type of data process being run
Streaming Concepts

Repeat messages

  • Occurs when the same message is sent multiple times or resent due to systems issues
  • Requires sequence handling to completely avoid, but might be safe to ignore
  • Sometimes is not an issue (consider a temperature measurement)
Streaming Concepts

Let's practice!

Streaming Concepts

Preparing Video For Download...