Elasticsearch Cluster Configuration
Elasticsearch is a distributed, scalable search and analytics engine built on top of Apache Lucene. It provides real-time indexing and searching of structured and unstructured data at scale, with built-in clustering, replication, and automatic failover capabilities. This comprehensive guide covers multi-node cluster setup, discovery mechanisms, shard allocation strategies, index lifecycle management, security, and monitoring for production Elasticsearch deployments.
Table of Contents
- Architecture and Concepts
- Installation
- Cluster Configuration
- Node Discovery
- Shard Allocation
- Index Lifecycle Management
- Security and Authentication
- Monitoring and Metrics
- Backup and Snapshots
- Performance Tuning
- Conclusion
Architecture and Concepts
Elasticsearch uses a distributed architecture where data is split into shards and distributed across nodes. Each shard is a complete Lucene index, containing a subset of documents. Replicas provide redundancy and increased search capacity. The cluster monitors itself through a consensus algorithm, electing a master node to manage cluster state and shard allocation.
Indices contain documents (JSON objects) organized into types (logical partitions within an index). Each document is identified by a unique ID and contains fields with various data types. Inverted indexes enable fast full-text search by mapping terms to documents containing those terms.
Installation
Install Elasticsearch on Linux systems. Use the official Elasticsearch repository:
# Add Elasticsearch repository (Ubuntu/Debian)
curl -fsSL https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
echo "deb https://artifacts.elastic.co/packages/8.x/apt stable main" | \
sudo tee /etc/apt/sources.list.d/elastic-8.x.list
# Update and install
sudo apt-get update
sudo apt-get install -y elasticsearch
# Or install specific version
sudo apt-get install -y elasticsearch=8.10.0
# Verify installation
elasticsearch --version
On CentOS/RHEL:
# Add repository
sudo dnf config-manager --add-repo https://artifacts.elastic.co/packages/8.x/yum
# Import GPG key
sudo rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
# Install
sudo dnf install -y elasticsearch
# Verify
elasticsearch --version
Create system user and directories:
# Elasticsearch creates 'elasticsearch' user during installation
sudo useradd -r -s /bin/false elasticsearch 2>/dev/null || true
# Create data and log directories
sudo mkdir -p /var/lib/elasticsearch /var/log/elasticsearch
sudo chown -R elasticsearch:elasticsearch /var/lib/elasticsearch /var/log/elasticsearch
sudo chmod 755 /var/lib/elasticsearch /var/log/elasticsearch
Enable and start the service:
# Enable automatic startup
sudo systemctl daemon-reload
sudo systemctl enable elasticsearch
# Start service
sudo systemctl start elasticsearch
# Monitor startup
sudo journalctl -u elasticsearch -f
# Check status
sudo systemctl status elasticsearch
# Verify cluster
curl -u elastic:password http://localhost:9200/_cluster/health
Cluster Configuration
Configure Elasticsearch for multi-node clustering. Edit the configuration file:
sudo nano /etc/elasticsearch/elasticsearch.yml
For node1 (192.168.1.10):
# Cluster name (must match across all nodes)
cluster.name: my-elasticsearch-cluster
# Node name (unique per node)
node.name: node-1
# Data directories
path.data: /var/lib/elasticsearch
# Log directory
path.logs: /var/log/elasticsearch
# Network configuration
network.host: 192.168.1.10
http.port: 9200
transport.port: 9300
# Advertise addresses for cluster communication
discovery.seed_hosts: ["192.168.1.10", "192.168.1.11", "192.168.1.12"]
cluster.initial_master_nodes: ["node-1", "node-2", "node-3"]
# Node roles (data, master, ingest)
node.roles: [master, data, ingest]
# Memory settings
bootstrap.memory_lock: true
# Heap size (typically 50% of available RAM, max 31GB)
# Set via /etc/elasticsearch/jvm.options.d/heap.options
-Xms4g
-Xmx4g
For node2 (192.168.1.11):
cluster.name: my-elasticsearch-cluster
node.name: node-2
network.host: 192.168.1.11
discovery.seed_hosts: ["192.168.1.10", "192.168.1.11", "192.168.1.12"]
cluster.initial_master_nodes: ["node-1", "node-2", "node-3"]
node.roles: [master, data, ingest]
bootstrap.memory_lock: true
And node3 (192.168.1.12):
cluster.name: my-elasticsearch-cluster
node.name: node-3
network.host: 192.168.1.12
discovery.seed_hosts: ["192.168.1.10", "192.168.1.11", "192.168.1.12"]
cluster.initial_master_nodes: ["node-1", "node-2", "node-3"]
node.roles: [master, data, ingest]
bootstrap.memory_lock: true
Configure memory locking to prevent swapping:
# Edit limits configuration
sudo nano /etc/security/limits.conf
Add:
elasticsearch soft memlock unlimited
elasticsearch hard memlock unlimited
elasticsearch soft nofile 65536
elasticsearch hard nofile 65536
elasticsearch soft nproc 4096
elasticsearch hard nproc 4096
Configure JVM settings:
sudo nano /etc/elasticsearch/jvm.options.d/heap.options
Add:
-Xms4g
-Xmx4g
Start all nodes and verify cluster formation:
# Start all nodes
sudo systemctl start elasticsearch
# Monitor cluster status
curl -u elastic:password http://localhost:9200/_cluster/health?pretty
# View cluster nodes
curl -u elastic:password http://localhost:9200/_nodes/stats?pretty | head -50
# Check cluster state
curl -u elastic:password http://localhost:9200/_cluster/state?pretty
Node Discovery
Configure node discovery mechanisms for cluster formation:
# Monitor discovery process
curl -u elastic:password http://localhost:9200/_nodes/discovery?pretty
# Check seed hosts
curl -u elastic:password http://localhost:9200/_nodes/settings?pretty | grep seed_hosts
Add nodes to existing cluster:
# On new node, configure with same cluster name and seed hosts
# Then start elasticsearch - it will automatically discover and join
# Verify new node joined
curl -u elastic:password http://localhost:9200/_nodes?pretty
Handle node removal:
# Gracefully shut down a node
sudo systemctl stop elasticsearch
# Node will be removed from cluster, data is redistributed
# Wait for cluster to be green before removing from infrastructure
curl -u elastic:password http://localhost:9200/_cluster/health?wait_for_status=green
Shard Allocation
Configure shard allocation policies:
# View current allocation settings
curl -u elastic:password http://localhost:9200/_cluster/settings?pretty
# Create index with specific shard configuration
curl -u elastic:password -X PUT http://localhost:9200/my-index-000001 -H "Content-Type: application/json" -d '{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1,
"index.routing.allocation.include._name": "node-1,node-2,node-3"
}
}'
# Allocate specific index to specific nodes
curl -u elastic:password -X PUT http://localhost:9200/my-index-000001/_settings -H "Content-Type: application/json" -d '{
"index.routing.allocation.include._name": "node-1,node-2"
}'
# Enable rack awareness
curl -u elastic:password -X PUT http://localhost:9200/_cluster/settings -H "Content-Type: application/json" -d '{
"persistent": {
"cluster.routing.allocation.awareness.attributes": "rack_id",
"cluster.routing.allocation.awareness.force.zone.values": "rack1,rack2,rack3"
}
}'
# Monitor shard allocation
curl -u elastic:password http://localhost:9200/_cat/shards?v
curl -u elastic:password http://localhost:9200/_cat/allocation?v
Index Lifecycle Management
Configure automatic index management:
# Create ILM policy
curl -u elastic:password -X PUT http://localhost:9200/_ilm/policy/my-policy -H "Content-Type: application/json" -d '{
"policy": "my-policy",
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"rollover": {
"max_primary_shard_size": "50GB",
"max_age": "30d"
}
}
},
"warm": {
"min_age": "7d",
"actions": {
"set_priority": {
"priority": 50
},
"shrink": {
"number_of_shards": 1
}
}
},
"cold": {
"min_age": "30d",
"actions": {
"set_priority": {
"priority": 0
},
"searchable_snapshot": {
"snapshot_repository": "my-repository"
}
}
},
"delete": {
"min_age": "90d",
"actions": {
"delete": {}
}
}
}
}'
# Apply ILM policy to index template
curl -u elastic:password -X PUT http://localhost:9200/_index_template/my-template -H "Content-Type: application/json" -d '{
"index_patterns": ["my-logs-*"],
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1,
"index.lifecycle.name": "my-policy",
"index.lifecycle.rollover_alias": "my-logs"
},
"mappings": {
"properties": {
"timestamp": {"type": "date"},
"message": {"type": "text"}
}
}
}'
# View ILM policy
curl -u elastic:password http://localhost:9200/_ilm/policy/my-policy?pretty
# Check index lifecycle status
curl -u elastic:password http://localhost:9200/my-logs-000001/_ilm/explain?pretty
Security and Authentication
Enable and configure Elasticsearch security:
# Create users
curl -u elastic:password -X POST http://localhost:9200/_security/user/myuser -H "Content-Type: application/json" -d '{
"password": "secure_password",
"roles": ["data_scientist"]
}'
# Create custom role
curl -u elastic:password -X POST http://localhost:9200/_security/role/my_role -H "Content-Type: application/json" -d '{
"cluster": ["monitor"],
"indices": [
{
"names": ["my-index-*"],
"privileges": ["read", "write"]
}
]
}'
# Assign role to user
curl -u elastic:password -X POST http://localhost:9200/_security/user/myuser/_roles -H "Content-Type: application/json" -d '{
"roles": ["my_role"]
}'
# List users
curl -u elastic:password http://localhost:9200/_security/user?pretty
# Enable HTTPS in configuration
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.http.ssl.enabled: true
xpack.security.transport.ssl.key: /etc/elasticsearch/certs/node-1-key.pem
xpack.security.transport.ssl.certificate: /etc/elasticsearch/certs/node-1.pem
xpack.security.http.ssl.key: /etc/elasticsearch/certs/node-1-key.pem
xpack.security.http.ssl.certificate: /etc/elasticsearch/certs/node-1.pem
Monitoring and Metrics
Monitor cluster health and performance:
# Cluster health overview
curl -u elastic:password http://localhost:9200/_cluster/health?pretty
# Detailed cluster status
curl -u elastic:password http://localhost:9200/_cluster/state?pretty
# Node statistics
curl -u elastic:password http://localhost:9200/_nodes/stats?pretty
# Index statistics
curl -u elastic:password http://localhost:9200/_cat/indices?v
# Shard distribution
curl -u elastic:password http://localhost:9200/_cat/shards?v
# Task management
curl -u elastic:password http://localhost:9200/_tasks?pretty
# Monitor hot threads (performance bottlenecks)
curl -u elastic:password http://localhost:9200/_nodes/hot_threads?pretty
# Thread pool statistics
curl -u elastic:password http://localhost:9200/_nodes/stats/thread_pool?pretty
# Memory usage per node
curl -u elastic:password http://localhost:9200/_nodes/stats/jvm?pretty
Set up monitoring with X-Pack:
# Check license
curl -u elastic:password http://localhost:9200/_license?pretty
# Enable monitoring
curl -u elastic:password -X PUT http://localhost:9200/_cluster/settings -H "Content-Type: application/json" -d '{
"persistent": {
"xpack.monitoring.collection.enabled": true
}
}'
# View monitoring data
curl -u elastic:password http://localhost:9200/.monitoring-es-*/search?pretty
Backup and Snapshots
Create and manage snapshots:
# Register snapshot repository
curl -u elastic:password -X PUT http://localhost:9200/_snapshot/my-backup -H "Content-Type: application/json" -d '{
"type": "fs",
"settings": {
"location": "/var/elasticsearch/backups/my-backup"
}
}'
# Or use S3 repository
curl -u elastic:password -X PUT http://localhost:9200/_snapshot/s3-backup -H "Content-Type: application/json" -d '{
"type": "s3",
"settings": {
"bucket": "my-backup-bucket",
"region": "us-east-1",
"base_path": "elasticsearch"
}
}'
# Create snapshot
curl -u elastic:password -X PUT http://localhost:9200/_snapshot/my-backup/snapshot-001?wait_for_completion=false -H "Content-Type: application/json" -d '{
"indices": "my-index-*",
"ignore_unavailable": true,
"include_global_state": true
}'
# View snapshot status
curl -u elastic:password http://localhost:9200/_snapshot/my-backup/snapshot-001?pretty
# List snapshots
curl -u elastic:password http://localhost:9200/_snapshot/my-backup/_all?pretty
# Restore from snapshot
curl -u elastic:password -X POST http://localhost:9200/_snapshot/my-backup/snapshot-001/_restore -H "Content-Type: application/json" -d '{
"indices": "my-index-*",
"include_global_state": false,
"rename_pattern": "(.+)",
"rename_replacement": "$1-restored"
}'
Performance Tuning
Optimize Elasticsearch for your workload:
# Adjust refresh interval for better indexing performance
curl -u elastic:password -X PUT http://localhost:9200/my-index/_settings -H "Content-Type: application/json" -d '{
"index": {
"refresh_interval": "30s"
}
}'
# Configure query cache
curl -u elastic:password -X PUT http://localhost:9200/my-index/_settings -H "Content-Type: application/json" -d '{
"index": {
"queries.cache.enabled": true,
"max_result_window": 100000
}
}'
# Disable unnecessary features
curl -u elastic:password -X PUT http://localhost:9200/my-index/_settings -H "Content-Type: application/json" -d '{
"index": {
"_source.enabled": false
}
}'
# Configure merge settings
curl -u elastic:password -X PUT http://localhost:9200/my-index/_settings -H "Content-Type: application/json" -d '{
"index.merge.scheduler.max_thread_count": 2,
"index.merge.scheduler.auto_throttle": true
}'
# Monitor performance
curl -u elastic:password http://localhost:9200/_nodes/stats/indices?pretty
curl -u elastic:password http://localhost:9200/_nodes/stats/fs?pretty
Conclusion
Elasticsearch provides a powerful, scalable search and analytics platform suitable for applications requiring real-time indexing and complex queries across massive datasets. Its built-in clustering, replication, and automatic failover ensure high availability while index lifecycle management enables cost-effective data retention at scale. By properly configuring node discovery, shard allocation, security, and monitoring, you can operate production Elasticsearch clusters that deliver consistent performance and reliability for search-driven applications.


