Apache ZooKeeper Installation and Configuration

Apache ZooKeeper is a distributed coordination service used by systems like Kafka, HBase, and Hadoop for leader election, configuration management, and distributed locking. This guide covers deploying a ZooKeeper ensemble on Linux, configuring performance parameters, setting ACLs, and integrating with Kafka.

Prerequisites

  • 3 or 5 Linux servers (odd number for quorum) running Ubuntu 22.04/Debian 12 or CentOS/Rocky 9
  • Minimum 2 GB RAM per node; 4+ GB recommended for production
  • Java 11 or 17 (OpenJDK)
  • Ports 2181 (client), 2888 (peer), 3888 (leader election) open between nodes
  • Synchronized clocks (NTP)
  • Dedicated disk for data directory (avoid shared storage)

Install Java and ZooKeeper

Install Java and ZooKeeper on all nodes:

# Install Java 17
# Ubuntu/Debian
sudo apt update && sudo apt install -y openjdk-17-jdk-headless

# CentOS/Rocky
sudo dnf install -y java-17-openjdk-headless

# Verify Java
java -version

# Download ZooKeeper
ZK_VERSION=3.9.2
curl -L https://downloads.apache.org/zookeeper/zookeeper-${ZK_VERSION}/apache-zookeeper-${ZK_VERSION}-bin.tar.gz \
  -o /tmp/zookeeper.tar.gz

sudo tar xvf /tmp/zookeeper.tar.gz -C /opt/
sudo ln -s /opt/apache-zookeeper-${ZK_VERSION}-bin /opt/zookeeper

# Create system user
sudo useradd -r -s /sbin/nologin -d /opt/zookeeper zookeeper
sudo chown -R zookeeper:zookeeper /opt/zookeeper
sudo chown -R zookeeper:zookeeper /opt/apache-zookeeper-${ZK_VERSION}-bin

# Create data and log directories
sudo mkdir -p /var/lib/zookeeper /var/log/zookeeper
sudo chown zookeeper:zookeeper /var/lib/zookeeper /var/log/zookeeper

Configure the ZooKeeper Ensemble

Each node needs a unique server ID (myid) and a shared zoo.cfg:

# Set the myid file - DIFFERENT on each node (1, 2, or 3)
# On node 1:
echo "1" | sudo tee /var/lib/zookeeper/myid

# On node 2:
echo "2" | sudo tee /var/lib/zookeeper/myid

# On node 3:
echo "3" | sudo tee /var/lib/zookeeper/myid

Create the main configuration on all nodes:

sudo tee /opt/zookeeper/conf/zoo.cfg << 'EOF'
# Tick time (milliseconds) - basic unit for timeouts
tickTime=2000

# How many ticks a follower can be behind the leader during initial sync
initLimit=10

# How many ticks a follower can be behind the leader during normal operation
syncLimit=5

# Data directory
dataDir=/var/lib/zookeeper
dataLogDir=/var/log/zookeeper

# Client port
clientPort=2181

# Maximum client connections per server
maxClientCnxns=200

# Snapshot retention (minimum 3 snapshots kept)
autopurge.snapRetainCount=3
autopurge.purgeInterval=24    # hours

# Enable admin server on port 8080
admin.enableServer=true
admin.serverPort=8080

# Increase JVM heap for production
# (set via JVMFLAGS in environment)

# Ensemble members (server.ID=host:peer-port:leader-election-port)
server.1=192.168.1.10:2888:3888
server.2=192.168.1.11:2888:3888
server.3=192.168.1.12:2888:3888

# 4-letter words command whitelist
4lw.commands.whitelist=mntr,conf,ruok,stat,srvr,cons,dump,envi,dirs,crst,wchs

# Increase client session timeout limits
minSessionTimeout=4000
maxSessionTimeout=40000
EOF

Set JVM memory options:

sudo tee /opt/zookeeper/conf/java.env << 'EOF'
export JVMFLAGS="-Xmx2g -Xms2g -XX:+UseG1GC -XX:MaxGCPauseMillis=200"
export ZOO_LOG_DIR=/var/log/zookeeper
export ZOO_LOG4J_PROP="INFO,ROLLINGFILE"
EOF

Start ZooKeeper and Verify the Cluster

Create a systemd service:

sudo tee /etc/systemd/system/zookeeper.service << 'EOF'
[Unit]
Description=Apache ZooKeeper
Documentation=https://zookeeper.apache.org
After=network.target

[Service]
User=zookeeper
Group=zookeeper
Type=forking
Environment="JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64"
ExecStart=/opt/zookeeper/bin/zkServer.sh start
ExecStop=/opt/zookeeper/bin/zkServer.sh stop
ExecReload=/opt/zookeeper/bin/zkServer.sh restart
PIDFile=/opt/zookeeper/data/zookeeper_server.pid
Restart=on-failure
RestartSec=10

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable --now zookeeper

# Check status on all nodes
/opt/zookeeper/bin/zkServer.sh status

Verify the ensemble formed correctly:

# Use 4-letter commands (requires whitelist above)
echo ruok | nc localhost 2181    # Should return "imok"
echo stat | nc localhost 2181    # Shows mode (leader/follower) and stats
echo mntr | nc localhost 2181    # Detailed monitoring data
echo srvr | nc localhost 2181    # Server summary

# Connect with CLI
/opt/zookeeper/bin/zkCli.sh -server 192.168.1.10:2181

ZooKeeper CLI and Data Operations

ZooKeeper stores data as a hierarchical tree of znodes:

# Connect to the ensemble
/opt/zookeeper/bin/zkCli.sh -server 192.168.1.10:2181,192.168.1.11:2181,192.168.1.12:2181

# Within the CLI:
# Create a znode
create /myapp "initial-data"

# Create persistent znode with data
create /myapp/config "host=db.example.com port=5432"

# Create ephemeral znode (deleted when session ends)
create -e /myapp/lock/node-01 "locked"

# Create sequential znode (auto-increments)
create -s /myapp/queue/item- "task-data"

# Read data
get /myapp/config

# Update data
set /myapp/config "host=db2.example.com port=5432"

# List children
ls /myapp

# Get stats
stat /myapp/config

# Watch for changes (prints event when data changes)
get -w /myapp/config

# Delete a znode
delete /myapp/config

# Recursively delete a znode and children
deleteall /myapp

Access Control Lists (ACLs)

ZooKeeper supports ACLs to restrict access to znodes:

# Within zkCli.sh:

# Create a znode with world:anyone:read permission only
create /public "data"
setAcl /public world:anyone:r

# Create a password-protected znode (digest auth)
# First, generate the digest: base64(sha1(user:password))
echo -n "admin:secretpass" | openssl dgst -sha1 -binary | openssl enc -base64

# Add digest ACL (replace the base64 hash)
addauth digest admin:secretpass
create /secure "sensitive-data"
setAcl /secure digest:admin:HASH_FROM_ABOVE:cdrwa

# ACL permissions: c=create, d=delete, r=read, w=write, a=admin
# Check current ACL
getAcl /secure

For IP-based access:

# Allow only specific IP
setAcl /app/secrets ip:192.168.1.10:cdrwa,ip:192.168.1.11:r

Monitoring and JMX

Enable JMX for monitoring tools like Prometheus with jmx_exporter:

# Add to java.env
export JMXPORT=9999
export JMXAUTH=false
export JMXSSL=false

# Use 4-letter commands for quick monitoring
watch -n 5 'echo mntr | nc localhost 2181 | grep -E "outstanding|latency|znodes|connections"'

# Key metrics to watch:
echo mntr | nc localhost 2181 | grep -E \
  "zk_avg_latency|zk_outstanding_requests|zk_znode_count|zk_watch_count|zk_num_alive_connections"

Use the admin server HTTP API (port 8080):

# Get server status
curl http://localhost:8080/commands/stat

# Get monitor data (Prometheus-compatible)
curl http://localhost:8080/commands/mntr

# List all commands
curl http://localhost:8080/commands

Kafka Integration

ZooKeeper is used by Kafka (pre-KRaft mode) for broker coordination:

# In Kafka's server.properties, configure ZooKeeper connection
sudo nano /opt/kafka/config/server.properties
# ZooKeeper connection string
zookeeper.connect=192.168.1.10:2181,192.168.1.11:2181,192.168.1.12:2181/kafka

# ZooKeeper session timeout
zookeeper.session.timeout.ms=18000
zookeeper.connection.timeout.ms=10000
# Verify Kafka stores its data in ZooKeeper
/opt/zookeeper/bin/zkCli.sh -server 192.168.1.10:2181

# Within zkCli (after Kafka starts):
ls /kafka
ls /kafka/brokers/ids
ls /kafka/topics

Troubleshooting

Node not joining the ensemble:

# Check firewall
sudo ufw status      # Ubuntu
sudo firewall-cmd --list-ports   # CentOS

# Allow ZooKeeper ports
sudo ufw allow 2181/tcp
sudo ufw allow 2888/tcp
sudo ufw allow 3888/tcp

# Check logs
sudo tail -100 /var/log/zookeeper/zookeeper.log
journalctl -u zookeeper -n 100 --no-pager

ZooKeeper in "LOOKING" state (no leader):

# Verify myid is set correctly and matches zoo.cfg
cat /var/lib/zookeeper/myid

# Check all servers are reachable
for ip in 192.168.1.10 192.168.1.11 192.168.1.12; do
  echo -n "${ip}: "; echo stat | nc ${ip} 2181 | grep Mode
done

High latency or outstanding requests:

echo mntr | nc localhost 2181 | grep -E "latency|outstanding"
# If outstanding_requests is consistently > 0, reduce client load or increase ZK heap

Data directory fills up:

# Check autopurge settings in zoo.cfg
# Manually trigger purge
/opt/zookeeper/bin/zkCleanup.sh /var/lib/zookeeper -n 5   # keep 5 snapshots

Conclusion

Apache ZooKeeper provides reliable distributed coordination for systems like Kafka and HBase that require consensus and leader election across nodes. Running a 3 or 5-node ensemble ensures fault tolerance, and proper JVM tuning is critical for low-latency operation. For new Kafka deployments, consider using KRaft mode (Kafka's built-in consensus) which eliminates the ZooKeeper dependency, but for existing Kafka clusters or other ZooKeeper-dependent systems this setup remains essential.