Batching vs. streaming

Streaming Concepts

Mike Metzger

Data Engineer

Quick review

  • Batch processes handle data in groups, or batches
  • The most important details about batch processing is the batch size, and the batch frequency
  • Queues store / process data in order of insertion
  • Queues are batches, with a batch size of one!
  • Streams handle data without pausing along the way
  • Streams don't have a defined end
  • Streams maintain order!
Streaming Concepts

Fire!

  • Bucket brigade
    • Batch size (how large is the bucket)
    • Batch frequency (how fast to pass bucket)

Bucket brigade

  • Fire hose
    • Continuous amount of data
    • Not sure how much water

Firehose

1 Albert B. Kinne, Public domain, via Wikimedia Commons 2 Commander, U.S. Naval Forces Europe-Africa/U.S. 6th Fleet, Public domain, via Wikimedia Commons
Streaming Concepts

How to determine the best approach?

  • Depends on requirements
  • If we can process in groups, batching often best due to simplicity
  • If we need order, but it's okay to pause, use a queue
  • If we need continuous data, or we don't know how much data, try streaming
  • If we can't stop until the data is processed, use streaming
Streaming Concepts

Let's practice!

Streaming Concepts

Preparing Video For Download...