Batch issues

Streaming Concepts

Mike Metzger

Data Engineer

Delays

  • Time until data is ready to process
    • Is all data available?
  • Time until process begins
    • When does the next interval start?
  • Time to process data
    • How long until completion?
  • Time until processed data is available for use
    • How long until users can use the data?
Streaming Concepts

Example #1

Waiting on the source data

  • Machines sending log files at times of low utilization
  • Works ok during normal utilization
  • High utilization would limit ability to send logs, potentially hiding issues.
Streaming Concepts

Example #2

Waiting on the process

  • 100GB log files per day
  • Currently takes 23 hrs to process
  • Approximately 4.4GB/hr
  • Grows at 5% per month
  • Next month would be 105GB and take ~24 hrs
  • Following month would be ~110GB and take ~25 hrs
  • Takes longer than a day to process one day's worth of data!
Streaming Concepts

Example #3

Waiting on the data to be available

  • How long until analytics are available?
  • Sales report must wait for all information to generate
  • Sum of delays is minimum time to generate new report
    • Amount of time to collect / prepare data: 1 day
    • Time required to process data: 7 hrs
    • Time to update systems: 5 hrs
    • Time to generate report: 2 min
  • Total time for each report: 1.5 days
Streaming Concepts

Let's practice!

Streaming Concepts

Preparing Video For Download...