Apache Kafka Installation on Linux

Apache Kafka is a distributed event streaming platform designed for high-throughput, low-latency message delivery. It uses a publish-subscribe model with persistent storage, making it ideal for building real-time data pipelines and streaming applications. This guide covers installation, configuration, and basic operations for running Kafka on Linux.

Table of Contents

Prerequisites

Before installing Kafka, ensure you have:

  • Linux system (Ubuntu 20.04+, CentOS 8+, or Debian 11+)
  • Root or sudo access
  • At least 4GB RAM available
  • At least 50GB disk space for message storage
  • Java 8 or higher installed
  • Basic networking knowledge

Installing Java and Dependencies

Kafka requires Java to run. Install the OpenJDK runtime:

sudo apt-get update
sudo apt-get install -y openjdk-11-jdk

Verify the Java installation:

java -version

The output should show Java version 11 or higher.

For CentOS/RHEL:

sudo yum install -y java-11-openjdk java-11-openjdk-devel

Installing Apache Kafka

Download the latest Kafka release from the Apache repository. Check for current versions at https://kafka.apache.org/downloads:

cd /opt
sudo wget https://archive.apache.org/dist/kafka/3.6.1/kafka_2.13-3.6.1.tgz
sudo tar -xzf kafka_2.13-3.6.1.tgz
sudo mv kafka_2.13-3.6.1 kafka
sudo chown -R kafka:kafka /opt/kafka

Create a dedicated user for Kafka:

sudo useradd -r -s /bin/bash kafka

Set environment variables in the Kafka user's profile:

sudo tee /home/kafka/.bashrc <<EOF
export KAFKA_HOME=/opt/kafka
export PATH=$KAFKA_HOME/bin:$PATH
EOF

For system-wide access, create a symlink:

sudo ln -s /opt/kafka/bin/kafka-* /usr/local/bin/

ZooKeeper vs KRaft Mode

Kafka traditionally requires Apache ZooKeeper for cluster coordination. However, Kafka 3.3+ supports KRaft (Kafka Raft) mode, eliminating the ZooKeeper dependency.

Using ZooKeeper (Traditional Method)

Install ZooKeeper:

sudo apt-get install -y zookeeper

Start ZooKeeper:

sudo systemctl start zookeeper
sudo systemctl enable zookeeper

Verify ZooKeeper is running:

echo ruok | nc localhost 2181

A response of imok indicates ZooKeeper is operational.

Using KRaft Mode (Modern Method)

KRaft mode simplifies deployment by removing the external coordinator. Configure Kafka in KRaft mode by editing the server properties file.

First, generate a cluster ID:

CLUSTER_ID=$(kafka-storage.sh random-uuid)
echo $CLUSTER_ID

Configuring Kafka Brokers

Create the Kafka configuration directory:

sudo mkdir -p /etc/kafka
sudo chown kafka:kafka /etc/kafka

For ZooKeeper mode, edit the server configuration:

sudo cp /opt/kafka/config/server.properties /etc/kafka/
sudo chown kafka:kafka /etc/kafka/server.properties

Edit /etc/kafka/server.properties:

sudo nano /etc/kafka/server.properties

Key configuration parameters:

# Broker identification
broker.id=1
node.id=1

# Listeners and advertised addresses
listeners=PLAINTEXT://0.0.0.0:9092
advertised.listeners=PLAINTEXT://kafka-broker-1.example.com:9092
listener.security.protocol.map=PLAINTEXT:PLAINTEXT

# ZooKeeper connection
zookeeper.connect=localhost:2181/kafka

# Log and storage settings
log.dir=/var/log/kafka
log.dirs=/var/kafka-logs

# Default replication settings
default.replication.factor=3
min.insync.replicas=2

# Retention settings
log.retention.hours=168
log.retention.bytes=1073741824
log.segment.bytes=1073741824

# Cleanup policy
log.cleanup.policy=delete

# Performance settings
num.network.threads=8
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600

# Group coordinator settings
group.initial.rebalance.delay.ms=3000

For KRaft mode, create a new configuration:

sudo nano /etc/kafka/kraft-server.properties

Add these settings:

# KRaft configuration
process.roles=broker,controller
node.id=1
[email protected]:9093
controller.listener.names=CONTROLLER
listeners=PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093
advertised.listeners=PLAINTEXT://kafka-broker-1.example.com:9092
listener.security.protocol.map=PLAINTEXT:PLAINTEXT,CONTROLLER:PLAINTEXT
inter.broker.listener.name=PLAINTEXT

# Log and storage
log.dir=/var/kafka-logs

# Other settings
log.retention.hours=168
num.network.threads=8
num.io.threads=8

Format storage for KRaft:

sudo mkdir -p /var/kafka-logs
sudo chown kafka:kafka /var/kafka-logs
sudo -u kafka bash -c "CLUSTER_ID='$CLUSTER_ID' /opt/kafka/bin/kafka-storage.sh format -t $CLUSTER_ID -c /etc/kafka/kraft-server.properties"

Create a systemd service file for Kafka:

sudo tee /etc/systemd/system/kafka.service <<EOF
[Unit]
Description=Apache Kafka
Requires=network-online.target
After=network-online.target

[Service]
Type=simple
User=kafka
ExecStart=/opt/kafka/bin/kafka-server-start.sh /etc/kafka/server.properties
ExecStop=/opt/kafka/bin/kafka-server-stop.sh
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target
EOF

For KRaft mode, use -c /etc/kafka/kraft-server.properties instead.

Start and enable Kafka:

sudo systemctl daemon-reload
sudo systemctl start kafka
sudo systemctl enable kafka

Verify Kafka is running:

sudo systemctl status kafka
ps aux | grep kafka

Creating and Managing Topics

Use the Kafka topic management tool to create topics:

kafka-topics.sh --create \
  --topic orders \
  --partitions 3 \
  --replication-factor 2 \
  --bootstrap-server localhost:9092

List all topics:

kafka-topics.sh --list --bootstrap-server localhost:9092

Describe a specific topic:

kafka-topics.sh --describe \
  --topic orders \
  --bootstrap-server localhost:9092

Increase partitions on an existing topic:

kafka-topics.sh --alter \
  --topic orders \
  --partitions 5 \
  --bootstrap-server localhost:9092

Delete a topic:

kafka-topics.sh --delete \
  --topic orders \
  --bootstrap-server localhost:9092

Producers and Consumers

Test message production with the console producer:

kafka-console-producer.sh --topic orders --bootstrap-server localhost:9092

This opens an interactive prompt. Type messages and press Enter:

{"order_id": "12345", "customer": "John Doe", "amount": 99.99}
{"order_id": "12346", "customer": "Jane Smith", "amount": 149.50}

In another terminal, start a console consumer:

kafka-console-consumer.sh \
  --topic orders \
  --from-beginning \
  --bootstrap-server localhost:9092

The --from-beginning flag reads all messages from the start of the topic. Omit it to see only new messages.

Create a consumer group to track message consumption:

kafka-console-consumer.sh \
  --topic orders \
  --group order-processor \
  --bootstrap-server localhost:9092

List consumer groups:

kafka-consumer-groups.sh --list --bootstrap-server localhost:9092

Describe a consumer group:

kafka-consumer-groups.sh --describe \
  --group order-processor \
  --bootstrap-server localhost:9092

Reset consumer group offset to the beginning:

kafka-consumer-groups.sh --reset-offsets \
  --group order-processor \
  --topic orders \
  --to-earliest \
  --execute \
  --bootstrap-server localhost:9092

Message Retention and Cleanup

Control how long Kafka retains messages by modifying topic configurations.

Set retention time to 7 days:

kafka-configs.sh --bootstrap-server localhost:9092 \
  --entity-type topics \
  --entity-name orders \
  --alter \
  --add-config retention.ms=604800000

Set retention size to 1GB:

kafka-configs.sh --bootstrap-server localhost:9092 \
  --entity-type topics \
  --entity-name orders \
  --alter \
  --add-config retention.bytes=1073741824

Configure cleanup policy to compact (keeps latest value per key):

kafka-configs.sh --bootstrap-server localhost:9092 \
  --entity-type topics \
  --entity-name user-profiles \
  --alter \
  --add-config cleanup.policy=compact

View current topic configuration:

kafka-configs.sh --bootstrap-server localhost:9092 \
  --entity-type topics \
  --entity-name orders \
  --describe

Monitoring Kafka

Monitor broker metrics using JMX (Java Management Extensions). Configure JMX for Kafka by editing startup scripts:

export KAFKA_JMX_OPTS="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=localhost -Dcom.sun.management.jmxremote.rmi.port=9999"

Check broker status and metrics:

kafka-broker-api-versions.sh --bootstrap-server localhost:9092

Monitor under-replicated partitions:

kafka-topics.sh --describe --under-replicated-partitions --bootstrap-server localhost:9092

Check cluster metadata:

kafka-metadata.sh --snapshot /var/kafka-logs/__cluster_metadata-0/00000000000000000000.log --print

Troubleshooting

Check Kafka logs for errors:

sudo tail -f /var/log/kafka/server.log
sudo tail -f /var/kafka-logs/*.log

Verify broker connectivity:

kafka-broker-api-versions.sh --bootstrap-server localhost:9092

Check if ZooKeeper is properly connected:

echo dump | nc localhost 2181 | grep brokers

Reset broker state (destructive):

sudo systemctl stop kafka
sudo rm -rf /var/kafka-logs/*
sudo systemctl start kafka

Verify network connectivity between brokers:

nc -zv kafka-broker-2.example.com 9092
nc -zv kafka-broker-3.example.com 9092

Conclusion

Apache Kafka provides a scalable, fault-tolerant platform for event streaming and real-time data processing. This guide covered installation, both ZooKeeper and KRaft configuration modes, topic management, producer-consumer basics, and retention policies. For production deployments, implement security with TLS/SSL, configure authentication, set up monitoring with Prometheus and Grafana, establish backup procedures, and deploy across multiple broker nodes for high availability. Consider using managed Kafka services for enterprises requiring professional support and simplified operations.