Popular streaming systems
Streaming Concepts
Mike Metzger
Data engineer
Streaming tools
- Various tools available depending on needs
- Allows designers to specify the best tool for the job
- Common systems include:
- Celery
- Kafka
- Spark Streaming
Celery
- Distributed task queue / FIFO
- Used primarily as a job or task queue
- Often used for asynchronous tasks
- Sending password reset emails
- Fulfilling digital orders
- Resizing images
- Allows for real-time processing of significant quantity of messages
- Provides functionality for management and scaling (vertical & horizontal)
Apache Kafka
- Distributed event streaming system
- Designed to send events between producers and consumers
- Producers create events on a topic
- Topics are basically messages of a specified form
- Consumers receive new events
- Different consumers can handle events as needed (logging, transformations, relaying, etc)
- Handles storing of events as specified
- Extremely powerful, but can be tricky to set up

Kafka applications
- Best used for passing data between multiple systems
- Single source of truth
- Change data capture
- Data backups
- Data system migrations
Spark streaming
- Part of Apache Spark
- Designed to process streaming data
- Builds upon the capabilities of Spark to process data in Scala, Python, SQL, and so forth
- Useful for processing large amounts of data and in machine learning scenarios
- Able to transition from batch to stream processing fairly easily
- Not designed to store or log events, but primarily to process or modify the data
Let's practice!
Streaming Concepts
Preparing Video For Download...