Apache ZooKeeper Installation and Configuration
Apache ZooKeeper is a distributed coordination service used by systems like Kafka, HBase, and Hadoop for leader election, configuration management, and distributed locking. This guide covers deploying a ZooKeeper ensemble on Linux, configuring performance parameters, setting ACLs, and integrating with Kafka.
Prerequisites
- 3 or 5 Linux servers (odd number for quorum) running Ubuntu 22.04/Debian 12 or CentOS/Rocky 9
- Minimum 2 GB RAM per node; 4+ GB recommended for production
- Java 11 or 17 (OpenJDK)
- Ports 2181 (client), 2888 (peer), 3888 (leader election) open between nodes
- Synchronized clocks (NTP)
- Dedicated disk for data directory (avoid shared storage)
Install Java and ZooKeeper
Install Java and ZooKeeper on all nodes:
# Install Java 17
# Ubuntu/Debian
sudo apt update && sudo apt install -y openjdk-17-jdk-headless
# CentOS/Rocky
sudo dnf install -y java-17-openjdk-headless
# Verify Java
java -version
# Download ZooKeeper
ZK_VERSION=3.9.2
curl -L https://downloads.apache.org/zookeeper/zookeeper-${ZK_VERSION}/apache-zookeeper-${ZK_VERSION}-bin.tar.gz \
-o /tmp/zookeeper.tar.gz
sudo tar xvf /tmp/zookeeper.tar.gz -C /opt/
sudo ln -s /opt/apache-zookeeper-${ZK_VERSION}-bin /opt/zookeeper
# Create system user
sudo useradd -r -s /sbin/nologin -d /opt/zookeeper zookeeper
sudo chown -R zookeeper:zookeeper /opt/zookeeper
sudo chown -R zookeeper:zookeeper /opt/apache-zookeeper-${ZK_VERSION}-bin
# Create data and log directories
sudo mkdir -p /var/lib/zookeeper /var/log/zookeeper
sudo chown zookeeper:zookeeper /var/lib/zookeeper /var/log/zookeeper
Configure the ZooKeeper Ensemble
Each node needs a unique server ID (myid) and a shared zoo.cfg:
# Set the myid file - DIFFERENT on each node (1, 2, or 3)
# On node 1:
echo "1" | sudo tee /var/lib/zookeeper/myid
# On node 2:
echo "2" | sudo tee /var/lib/zookeeper/myid
# On node 3:
echo "3" | sudo tee /var/lib/zookeeper/myid
Create the main configuration on all nodes:
sudo tee /opt/zookeeper/conf/zoo.cfg << 'EOF'
# Tick time (milliseconds) - basic unit for timeouts
tickTime=2000
# How many ticks a follower can be behind the leader during initial sync
initLimit=10
# How many ticks a follower can be behind the leader during normal operation
syncLimit=5
# Data directory
dataDir=/var/lib/zookeeper
dataLogDir=/var/log/zookeeper
# Client port
clientPort=2181
# Maximum client connections per server
maxClientCnxns=200
# Snapshot retention (minimum 3 snapshots kept)
autopurge.snapRetainCount=3
autopurge.purgeInterval=24 # hours
# Enable admin server on port 8080
admin.enableServer=true
admin.serverPort=8080
# Increase JVM heap for production
# (set via JVMFLAGS in environment)
# Ensemble members (server.ID=host:peer-port:leader-election-port)
server.1=192.168.1.10:2888:3888
server.2=192.168.1.11:2888:3888
server.3=192.168.1.12:2888:3888
# 4-letter words command whitelist
4lw.commands.whitelist=mntr,conf,ruok,stat,srvr,cons,dump,envi,dirs,crst,wchs
# Increase client session timeout limits
minSessionTimeout=4000
maxSessionTimeout=40000
EOF
Set JVM memory options:
sudo tee /opt/zookeeper/conf/java.env << 'EOF'
export JVMFLAGS="-Xmx2g -Xms2g -XX:+UseG1GC -XX:MaxGCPauseMillis=200"
export ZOO_LOG_DIR=/var/log/zookeeper
export ZOO_LOG4J_PROP="INFO,ROLLINGFILE"
EOF
Start ZooKeeper and Verify the Cluster
Create a systemd service:
sudo tee /etc/systemd/system/zookeeper.service << 'EOF'
[Unit]
Description=Apache ZooKeeper
Documentation=https://zookeeper.apache.org
After=network.target
[Service]
User=zookeeper
Group=zookeeper
Type=forking
Environment="JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64"
ExecStart=/opt/zookeeper/bin/zkServer.sh start
ExecStop=/opt/zookeeper/bin/zkServer.sh stop
ExecReload=/opt/zookeeper/bin/zkServer.sh restart
PIDFile=/opt/zookeeper/data/zookeeper_server.pid
Restart=on-failure
RestartSec=10
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload
sudo systemctl enable --now zookeeper
# Check status on all nodes
/opt/zookeeper/bin/zkServer.sh status
Verify the ensemble formed correctly:
# Use 4-letter commands (requires whitelist above)
echo ruok | nc localhost 2181 # Should return "imok"
echo stat | nc localhost 2181 # Shows mode (leader/follower) and stats
echo mntr | nc localhost 2181 # Detailed monitoring data
echo srvr | nc localhost 2181 # Server summary
# Connect with CLI
/opt/zookeeper/bin/zkCli.sh -server 192.168.1.10:2181
ZooKeeper CLI and Data Operations
ZooKeeper stores data as a hierarchical tree of znodes:
# Connect to the ensemble
/opt/zookeeper/bin/zkCli.sh -server 192.168.1.10:2181,192.168.1.11:2181,192.168.1.12:2181
# Within the CLI:
# Create a znode
create /myapp "initial-data"
# Create persistent znode with data
create /myapp/config "host=db.example.com port=5432"
# Create ephemeral znode (deleted when session ends)
create -e /myapp/lock/node-01 "locked"
# Create sequential znode (auto-increments)
create -s /myapp/queue/item- "task-data"
# Read data
get /myapp/config
# Update data
set /myapp/config "host=db2.example.com port=5432"
# List children
ls /myapp
# Get stats
stat /myapp/config
# Watch for changes (prints event when data changes)
get -w /myapp/config
# Delete a znode
delete /myapp/config
# Recursively delete a znode and children
deleteall /myapp
Access Control Lists (ACLs)
ZooKeeper supports ACLs to restrict access to znodes:
# Within zkCli.sh:
# Create a znode with world:anyone:read permission only
create /public "data"
setAcl /public world:anyone:r
# Create a password-protected znode (digest auth)
# First, generate the digest: base64(sha1(user:password))
echo -n "admin:secretpass" | openssl dgst -sha1 -binary | openssl enc -base64
# Add digest ACL (replace the base64 hash)
addauth digest admin:secretpass
create /secure "sensitive-data"
setAcl /secure digest:admin:HASH_FROM_ABOVE:cdrwa
# ACL permissions: c=create, d=delete, r=read, w=write, a=admin
# Check current ACL
getAcl /secure
For IP-based access:
# Allow only specific IP
setAcl /app/secrets ip:192.168.1.10:cdrwa,ip:192.168.1.11:r
Monitoring and JMX
Enable JMX for monitoring tools like Prometheus with jmx_exporter:
# Add to java.env
export JMXPORT=9999
export JMXAUTH=false
export JMXSSL=false
# Use 4-letter commands for quick monitoring
watch -n 5 'echo mntr | nc localhost 2181 | grep -E "outstanding|latency|znodes|connections"'
# Key metrics to watch:
echo mntr | nc localhost 2181 | grep -E \
"zk_avg_latency|zk_outstanding_requests|zk_znode_count|zk_watch_count|zk_num_alive_connections"
Use the admin server HTTP API (port 8080):
# Get server status
curl http://localhost:8080/commands/stat
# Get monitor data (Prometheus-compatible)
curl http://localhost:8080/commands/mntr
# List all commands
curl http://localhost:8080/commands
Kafka Integration
ZooKeeper is used by Kafka (pre-KRaft mode) for broker coordination:
# In Kafka's server.properties, configure ZooKeeper connection
sudo nano /opt/kafka/config/server.properties
# ZooKeeper connection string
zookeeper.connect=192.168.1.10:2181,192.168.1.11:2181,192.168.1.12:2181/kafka
# ZooKeeper session timeout
zookeeper.session.timeout.ms=18000
zookeeper.connection.timeout.ms=10000
# Verify Kafka stores its data in ZooKeeper
/opt/zookeeper/bin/zkCli.sh -server 192.168.1.10:2181
# Within zkCli (after Kafka starts):
ls /kafka
ls /kafka/brokers/ids
ls /kafka/topics
Troubleshooting
Node not joining the ensemble:
# Check firewall
sudo ufw status # Ubuntu
sudo firewall-cmd --list-ports # CentOS
# Allow ZooKeeper ports
sudo ufw allow 2181/tcp
sudo ufw allow 2888/tcp
sudo ufw allow 3888/tcp
# Check logs
sudo tail -100 /var/log/zookeeper/zookeeper.log
journalctl -u zookeeper -n 100 --no-pager
ZooKeeper in "LOOKING" state (no leader):
# Verify myid is set correctly and matches zoo.cfg
cat /var/lib/zookeeper/myid
# Check all servers are reachable
for ip in 192.168.1.10 192.168.1.11 192.168.1.12; do
echo -n "${ip}: "; echo stat | nc ${ip} 2181 | grep Mode
done
High latency or outstanding requests:
echo mntr | nc localhost 2181 | grep -E "latency|outstanding"
# If outstanding_requests is consistently > 0, reduce client load or increase ZK heap
Data directory fills up:
# Check autopurge settings in zoo.cfg
# Manually trigger purge
/opt/zookeeper/bin/zkCleanup.sh /var/lib/zookeeper -n 5 # keep 5 snapshots
Conclusion
Apache ZooKeeper provides reliable distributed coordination for systems like Kafka and HBase that require consensus and leader election across nodes. Running a 3 or 5-node ensemble ensures fault tolerance, and proper JVM tuning is critical for low-latency operation. For new Kafka deployments, consider using KRaft mode (Kafka's built-in consensus) which eliminates the ZooKeeper dependency, but for existing Kafka clusters or other ZooKeeper-dependent systems this setup remains essential.


