Apache Kafka Installation on Linux
Apache Kafka is a distributed event streaming platform designed for high-throughput, low-latency message delivery. It uses a publish-subscribe model with persistent storage, making it ideal for building real-time data pipelines and streaming applications. This guide covers installation, configuration, and basic operations for running Kafka on Linux.
Table of Contents
- Prerequisites
- Installing Java and Dependencies
- Installing Apache Kafka
- ZooKeeper vs KRaft Mode
- Configuring Kafka Brokers
- Creating and Managing Topics
- Producers and Consumers
- Message Retention and Cleanup
- Monitoring Kafka
- Troubleshooting
- Conclusion
Prerequisites
Before installing Kafka, ensure you have:
- Linux system (Ubuntu 20.04+, CentOS 8+, or Debian 11+)
- Root or sudo access
- At least 4GB RAM available
- At least 50GB disk space for message storage
- Java 8 or higher installed
- Basic networking knowledge
Installing Java and Dependencies
Kafka requires Java to run. Install the OpenJDK runtime:
sudo apt-get update
sudo apt-get install -y openjdk-11-jdk
Verify the Java installation:
java -version
The output should show Java version 11 or higher.
For CentOS/RHEL:
sudo yum install -y java-11-openjdk java-11-openjdk-devel
Installing Apache Kafka
Download the latest Kafka release from the Apache repository. Check for current versions at https://kafka.apache.org/downloads:
cd /opt
sudo wget https://archive.apache.org/dist/kafka/3.6.1/kafka_2.13-3.6.1.tgz
sudo tar -xzf kafka_2.13-3.6.1.tgz
sudo mv kafka_2.13-3.6.1 kafka
sudo chown -R kafka:kafka /opt/kafka
Create a dedicated user for Kafka:
sudo useradd -r -s /bin/bash kafka
Set environment variables in the Kafka user's profile:
sudo tee /home/kafka/.bashrc <<EOF
export KAFKA_HOME=/opt/kafka
export PATH=$KAFKA_HOME/bin:$PATH
EOF
For system-wide access, create a symlink:
sudo ln -s /opt/kafka/bin/kafka-* /usr/local/bin/
ZooKeeper vs KRaft Mode
Kafka traditionally requires Apache ZooKeeper for cluster coordination. However, Kafka 3.3+ supports KRaft (Kafka Raft) mode, eliminating the ZooKeeper dependency.
Using ZooKeeper (Traditional Method)
Install ZooKeeper:
sudo apt-get install -y zookeeper
Start ZooKeeper:
sudo systemctl start zookeeper
sudo systemctl enable zookeeper
Verify ZooKeeper is running:
echo ruok | nc localhost 2181
A response of imok indicates ZooKeeper is operational.
Using KRaft Mode (Modern Method)
KRaft mode simplifies deployment by removing the external coordinator. Configure Kafka in KRaft mode by editing the server properties file.
First, generate a cluster ID:
CLUSTER_ID=$(kafka-storage.sh random-uuid)
echo $CLUSTER_ID
Configuring Kafka Brokers
Create the Kafka configuration directory:
sudo mkdir -p /etc/kafka
sudo chown kafka:kafka /etc/kafka
For ZooKeeper mode, edit the server configuration:
sudo cp /opt/kafka/config/server.properties /etc/kafka/
sudo chown kafka:kafka /etc/kafka/server.properties
Edit /etc/kafka/server.properties:
sudo nano /etc/kafka/server.properties
Key configuration parameters:
# Broker identification
broker.id=1
node.id=1
# Listeners and advertised addresses
listeners=PLAINTEXT://0.0.0.0:9092
advertised.listeners=PLAINTEXT://kafka-broker-1.example.com:9092
listener.security.protocol.map=PLAINTEXT:PLAINTEXT
# ZooKeeper connection
zookeeper.connect=localhost:2181/kafka
# Log and storage settings
log.dir=/var/log/kafka
log.dirs=/var/kafka-logs
# Default replication settings
default.replication.factor=3
min.insync.replicas=2
# Retention settings
log.retention.hours=168
log.retention.bytes=1073741824
log.segment.bytes=1073741824
# Cleanup policy
log.cleanup.policy=delete
# Performance settings
num.network.threads=8
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
# Group coordinator settings
group.initial.rebalance.delay.ms=3000
For KRaft mode, create a new configuration:
sudo nano /etc/kafka/kraft-server.properties
Add these settings:
# KRaft configuration
process.roles=broker,controller
node.id=1
[email protected]:9093
controller.listener.names=CONTROLLER
listeners=PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093
advertised.listeners=PLAINTEXT://kafka-broker-1.example.com:9092
listener.security.protocol.map=PLAINTEXT:PLAINTEXT,CONTROLLER:PLAINTEXT
inter.broker.listener.name=PLAINTEXT
# Log and storage
log.dir=/var/kafka-logs
# Other settings
log.retention.hours=168
num.network.threads=8
num.io.threads=8
Format storage for KRaft:
sudo mkdir -p /var/kafka-logs
sudo chown kafka:kafka /var/kafka-logs
sudo -u kafka bash -c "CLUSTER_ID='$CLUSTER_ID' /opt/kafka/bin/kafka-storage.sh format -t $CLUSTER_ID -c /etc/kafka/kraft-server.properties"
Create a systemd service file for Kafka:
sudo tee /etc/systemd/system/kafka.service <<EOF
[Unit]
Description=Apache Kafka
Requires=network-online.target
After=network-online.target
[Service]
Type=simple
User=kafka
ExecStart=/opt/kafka/bin/kafka-server-start.sh /etc/kafka/server.properties
ExecStop=/opt/kafka/bin/kafka-server-stop.sh
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
EOF
For KRaft mode, use -c /etc/kafka/kraft-server.properties instead.
Start and enable Kafka:
sudo systemctl daemon-reload
sudo systemctl start kafka
sudo systemctl enable kafka
Verify Kafka is running:
sudo systemctl status kafka
ps aux | grep kafka
Creating and Managing Topics
Use the Kafka topic management tool to create topics:
kafka-topics.sh --create \
--topic orders \
--partitions 3 \
--replication-factor 2 \
--bootstrap-server localhost:9092
List all topics:
kafka-topics.sh --list --bootstrap-server localhost:9092
Describe a specific topic:
kafka-topics.sh --describe \
--topic orders \
--bootstrap-server localhost:9092
Increase partitions on an existing topic:
kafka-topics.sh --alter \
--topic orders \
--partitions 5 \
--bootstrap-server localhost:9092
Delete a topic:
kafka-topics.sh --delete \
--topic orders \
--bootstrap-server localhost:9092
Producers and Consumers
Test message production with the console producer:
kafka-console-producer.sh --topic orders --bootstrap-server localhost:9092
This opens an interactive prompt. Type messages and press Enter:
{"order_id": "12345", "customer": "John Doe", "amount": 99.99}
{"order_id": "12346", "customer": "Jane Smith", "amount": 149.50}
In another terminal, start a console consumer:
kafka-console-consumer.sh \
--topic orders \
--from-beginning \
--bootstrap-server localhost:9092
The --from-beginning flag reads all messages from the start of the topic. Omit it to see only new messages.
Create a consumer group to track message consumption:
kafka-console-consumer.sh \
--topic orders \
--group order-processor \
--bootstrap-server localhost:9092
List consumer groups:
kafka-consumer-groups.sh --list --bootstrap-server localhost:9092
Describe a consumer group:
kafka-consumer-groups.sh --describe \
--group order-processor \
--bootstrap-server localhost:9092
Reset consumer group offset to the beginning:
kafka-consumer-groups.sh --reset-offsets \
--group order-processor \
--topic orders \
--to-earliest \
--execute \
--bootstrap-server localhost:9092
Message Retention and Cleanup
Control how long Kafka retains messages by modifying topic configurations.
Set retention time to 7 days:
kafka-configs.sh --bootstrap-server localhost:9092 \
--entity-type topics \
--entity-name orders \
--alter \
--add-config retention.ms=604800000
Set retention size to 1GB:
kafka-configs.sh --bootstrap-server localhost:9092 \
--entity-type topics \
--entity-name orders \
--alter \
--add-config retention.bytes=1073741824
Configure cleanup policy to compact (keeps latest value per key):
kafka-configs.sh --bootstrap-server localhost:9092 \
--entity-type topics \
--entity-name user-profiles \
--alter \
--add-config cleanup.policy=compact
View current topic configuration:
kafka-configs.sh --bootstrap-server localhost:9092 \
--entity-type topics \
--entity-name orders \
--describe
Monitoring Kafka
Monitor broker metrics using JMX (Java Management Extensions). Configure JMX for Kafka by editing startup scripts:
export KAFKA_JMX_OPTS="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=localhost -Dcom.sun.management.jmxremote.rmi.port=9999"
Check broker status and metrics:
kafka-broker-api-versions.sh --bootstrap-server localhost:9092
Monitor under-replicated partitions:
kafka-topics.sh --describe --under-replicated-partitions --bootstrap-server localhost:9092
Check cluster metadata:
kafka-metadata.sh --snapshot /var/kafka-logs/__cluster_metadata-0/00000000000000000000.log --print
Troubleshooting
Check Kafka logs for errors:
sudo tail -f /var/log/kafka/server.log
sudo tail -f /var/kafka-logs/*.log
Verify broker connectivity:
kafka-broker-api-versions.sh --bootstrap-server localhost:9092
Check if ZooKeeper is properly connected:
echo dump | nc localhost 2181 | grep brokers
Reset broker state (destructive):
sudo systemctl stop kafka
sudo rm -rf /var/kafka-logs/*
sudo systemctl start kafka
Verify network connectivity between brokers:
nc -zv kafka-broker-2.example.com 9092
nc -zv kafka-broker-3.example.com 9092
Conclusion
Apache Kafka provides a scalable, fault-tolerant platform for event streaming and real-time data processing. This guide covered installation, both ZooKeeper and KRaft configuration modes, topic management, producer-consumer basics, and retention policies. For production deployments, implement security with TLS/SSL, configure authentication, set up monitoring with Prometheus and Grafana, establish backup procedures, and deploy across multiple broker nodes for high availability. Consider using managed Kafka services for enterprises requiring professional support and simplified operations.


