Kafka architecture
Introduction to Apache Kafka
Mike Metzger
Data Engineering Consultant
Kafka components
Kafka server
A cluster of one or more computers
Stores data
Manages communications
Can also integrate with other systems (databases, logs, etc))
Kafka clients
Read via Kafka consumer
Write via Kafka producer
Process data as required
Kafka server
Storage, also known as the Kafka Broker
Data written by producers is stored and organized via topics
Topics are partitioned, or stored in separate pieces
Messages / events stored in a given partition based on an event id
Messages are retrieved in same order as written
1
Image sourced from https://kafka.apache.org/intro
Partitions & replication
Kafka is fault-tolerant
If one system goes offline, others can provide the requested data
Max number of failures - 1 == replication factor
Max replication factor == number of servers
Copying partitions are how replication is handled
Example
Kafka cluster with 3 brokers
3 topics defined
2x replication == can lose 1 system
Broker 1 has copies of Topic 1 & Topic 2
Broker 2 has copies of Topic 2 & Topic 3
Broker 3 has copies of Topic 1 & Topic 3
Each topic is shared across 2 brokers within the cluster
Example with 1 failure
Broker 2 fails
Replication factor 2x
2 - 1 == Can handle 1 failed system
Each topic still has at least one copy on the cluster
Example with 2 failures
2 failed brokers
Replication factor 2x
More failures than supported (1 system)
Topic 2 is no longer available in the cluster
Topics 1 and 3 are still available, so not a complete failure
Let's practice!
Introduction to Apache Kafka
Preparing Video For Download...