Creating and managing Kafka clusters

Introduction to Apache Kafka

Mike Metzger

Data Engineering Consultant

What is ZooKeeper?

  • ZooKeeper is a framework to manage information & provide services necessary for running distributed systems
  • Primarily used by developers to create distributed applications
    • Users interact with ZooKeeper
  • Example applications
    • Kafka
    • Hadoop
    • Neo4j

Apache_ZooKeeper_logo.svg.png

Introduction to Apache Kafka

What does ZooKeeper do?

  • ZooKeeper provides services necessary to run distributed applications
    • Handling configuration
    • System naming
    • Synchronization across systems
    • Services required by a group of systems
  • Designed as a framework to prevent individual distributed applications from implementing custom versions of services.
    • Like a common connector, such as a power plug or hose nozzle
Introduction to Apache Kafka

ZooKeeper and Kafka

  • Kafka uses ZooKeeper for cluster management
    • Newer versions of Kafka can use KRaft
  • Two files used
    • config/zookeeper.properties
    • config/server.properties
# zookeeper.properties

# The directory where the 
# snapshot is stored.
dataDir=/tmp/zookeeper

# The port at which the clients 
# will connect
clientPort=2181

...
Introduction to Apache Kafka

config/server.properties

  • Handles specific Kafka configuration
    • Broker details
    • Network configuration
    • Event storage location
    • Basic topic configurations, including replication
...
# A comma separated list of directories under which to store log files
log.dirs=/tmp/kafka-logs
# The default number of log partitions per topic.
num.partitions=1
Introduction to Apache Kafka

Starting a Kafka cluster

  • Kafka clusters are started in two pieces
$ bin/zookeeper-server-start.sh config/zookeeper.properties
...
INFO   ______                  _                                           (org.apache.zookeeper.server.ZooKeeperServer)
INFO  |___  /                 | |                                          (org.apache.zookeeper.server.ZooKeeperServer)
INFO     / /    ___     ___   | | __   ___    ___   _ __     ___   _ __    (org.apache.zookeeper.server.ZooKeeperServer)
INFO    / /    / _ \   / _ \  | |/ /  / _ \  / _ \ | '_ \   / _ \ | '__| (org.apache.zookeeper.server.ZooKeeperServer)
INFO   / /__  | (_) | | (_) | |   <  |  __/ |  __/ | |_) | |  __/ | |     (org.apache.zookeeper.server.ZooKeeperServer)
INFO  /_____|  \___/   \___/  |_|\_\  \___|  \___| | .__/   \___| |_| (org.apache.zookeeper.server.ZooKeeperServer)
INFO                                               | |                      (org.apache.zookeeper.server.ZooKeeperServer)
INFO                                               |_|                      (org.apache.zookeeper.server.ZooKeeperServer)
INFO  (org.apache.zookeeper.server.ZooKeeperServer)
...
Introduction to Apache Kafka

Starting a Kafka cluster (continued)

$ bin/kafka-server-start.sh config/server.properties
INFO Awaiting socket connections on 0.0.0.0:9092. (kafka.network.DataPlaneAcceptor)
INFO Kafka version: 3.7.0 (org.apache.kafka.common.utils.AppInfoParser)
INFO Kafka commitId: 2ae524ed625438c5 (org.apache.kafka.common.utils.AppInfoParser)
INFO Kafka startTimeMs: 1717502877829 (org.apache.kafka.common.utils.AppInfoParser)
INFO [KafkaServer id=0] started (kafka.server.KafkaServer)
INFO [zk-broker-0-to-controller-forwarding-channel-manager]: Recorded new controller, 
     from now on will use node 815f25786085:9092 (id: 0 ra
Introduction to Apache Kafka

Stopping a Kafka cluster

  • bin/kafka-server-stop.sh
  • bin/zookeeper-server-stop.sh
  • Note the reverse order of shutdown
Introduction to Apache Kafka

Let's practice!

Introduction to Apache Kafka

Preparing Video For Download...