Elasticsearch Cluster Configuración

Elasticsearch is a distributed, scalable search and analytics engine built on top of Apache Lucene. It proporciona real-time indexing and searching of structured and unstructured data at scale, with built-in clustering, replication, and automatic failover capabilities. Esta guía completa cubre multi-nodo cluster setup, discovery mechanisms, shard allocation strategies, index lifecycle management, security, and monitoring for production Elasticsearch deployments.

Tabla de Contenidos

Architecture and Concepts

Elasticsearch uses a distributed architecture where data is split into shards and distributed across nodos. Each shard is a complete Lucene index, containing a subset of documents. Réplicas provide redundancy and increased search capacity. The cluster monitors itself through a consensus algorithm, electing a master nodo to manage cluster state and shard allocation.

Indices contain documents (JSON objects) organized into types (logical partitions within an index). Each document is identified by a unique ID and contains fields with various data types. Inverted indexes enable fast full-text search by mapping terms to documents containing those terms.

Instalación

Instala Elasticsearch on Linux systems. Use the official Elasticsearch repositorio:

# Add Elasticsearch repositorio (Ubuntu/Debian)
curl -fsSL https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
echo "deb https://artifacts.elastic.co/packages/8.x/apt stable main" | \
  sudo tee /etc/apt/sources.list.d/elastic-8.x.list

# Actualiza and install
sudo apt-get update
sudo apt-get install -y elasticsearch

# Or install specific version
sudo apt-get install -y elasticsearch=8.10.0

# Verifica installation
elasticsearch --version

On CentOS/RHEL:

# Add repositorio
sudo dnf config-manager --add-repo https://artifacts.elastic.co/packages/8.x/yum

# Import GPG key
sudo rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch

# Instala
sudo dnf install -y elasticsearch

# Verifica
elasticsearch --version

Crea system user and directories:

# Elasticsearch creates 'elasticsearch' user during installation
sudo useradd -r -s /bin/false elasticsearch 2>/dev/null || true

# Crea data and log directories
sudo mkdir -p /var/lib/elasticsearch /var/log/elasticsearch
sudo chown -R elasticsearch:elasticsearch /var/lib/elasticsearch /var/log/elasticsearch
sudo chmod 755 /var/lib/elasticsearch /var/log/elasticsearch

Habilita and start the servicio:

# Habilita automatic startup
sudo systemctl daemon-reload
sudo systemctl enable elasticsearch

# Inicia servicio
sudo systemctl start elasticsearch

# Monitorea startup
sudo journalctl -u elasticsearch -f

# Check status
sudo systemctl status elasticsearch

# Verifica cluster
curl -u elastic:password http://localhost:9200/_cluster/health

Cluster Configuración

Configura Elasticsearch for multi-nodo clustering. Edit the configuration file:

sudo nano /etc/elasticsearch/elasticsearch.yml

For node1 (192.168.1.10):

# Cluster name (must match across all nodos)
cluster.name: my-elasticsearch-cluster

# Nodo name (unique per nodo)
nodo.name: nodo-1

# Data directories
path.data: /var/lib/elasticsearch

# Log directory
path.logs: /var/log/elasticsearch

# Red configuration
red.host: 192.168.1.10
http.puerto: 9200
transport.puerto: 9300

# Advertise addresses for cluster communication
discovery.seed_hosts: ["192.168.1.10", "192.168.1.11", "192.168.1.12"]
cluster.initial_master_nodes: ["nodo-1", "nodo-2", "nodo-3"]

# Nodo roles (data, master, ingest)
nodo.roles: [master, data, ingest]

# Memory settings
bootstrap.memory_lock: true

# Heap size (typically 50% of available RAM, max 31GB)
# Set via /etc/elasticsearch/jvm.options.d/heap.options
-Xms4g
-Xmx4g

For node2 (192.168.1.11):

cluster.name: my-elasticsearch-cluster
nodo.name: nodo-2
red.host: 192.168.1.11
discovery.seed_hosts: ["192.168.1.10", "192.168.1.11", "192.168.1.12"]
cluster.initial_master_nodes: ["nodo-1", "nodo-2", "nodo-3"]
nodo.roles: [master, data, ingest]
bootstrap.memory_lock: true

And node3 (192.168.1.12):

cluster.name: my-elasticsearch-cluster
nodo.name: nodo-3
red.host: 192.168.1.12
discovery.seed_hosts: ["192.168.1.10", "192.168.1.11", "192.168.1.12"]
cluster.initial_master_nodes: ["nodo-1", "nodo-2", "nodo-3"]
nodo.roles: [master, data, ingest]
bootstrap.memory_lock: true

Configura memory locking to prevent swapping:

# Edit limits configuration
sudo nano /etc/security/limits.conf

Add:

elasticsearch soft memlock unlimited
elasticsearch hard memlock unlimited
elasticsearch soft nofile 65536
elasticsearch hard nofile 65536
elasticsearch soft nproc 4096
elasticsearch hard nproc 4096

Configura JVM settings:

sudo nano /etc/elasticsearch/jvm.options.d/heap.options

Add:

-Xms4g
-Xmx4g

Inicia all nodos and verifica cluster formation:

# Inicia all nodos
sudo systemctl start elasticsearch

# Monitorea cluster status
curl -u elastic:password http://localhost:9200/_cluster/health?pretty

# View cluster nodos
curl -u elastic:password http://localhost:9200/_nodes/stats?pretty | head -50

# Check cluster state
curl -u elastic:password http://localhost:9200/_cluster/state?pretty

Nodo Discovery

Configura nodo discovery mechanisms for cluster formation:

# Monitorea discovery process
curl -u elastic:password http://localhost:9200/_nodes/discovery?pretty

# Check seed hosts
curl -u elastic:password http://localhost:9200/_nodes/settings?pretty | grep seed_hosts

Add nodos to existing cluster:

# On new nodo, configure with same cluster name and seed hosts
# Then start elasticsearch - it will automatically discover and join

# Verifica new nodo joined
curl -u elastic:password http://localhost:9200/_nodes?pretty

Handle nodo removal:

# Gracefully shut down a nodo
sudo systemctl stop elasticsearch

# Nodo will be removed from cluster, data is redistributed
# Wait for cluster to be green before removing from infrastructure

curl -u elastic:password http://localhost:9200/_cluster/health?wait_for_status=green

Shard Allocation

Configura shard allocation policies:

# View current allocation settings
curl -u elastic:password http://localhost:9200/_cluster/settings?pretty

# Crea index with specific shard configuration
curl -u elastic:password -X PUT http://localhost:9200/my-index-000001 -H "Content-Type: application/json" -d '{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1,
    "index.routing.allocation.include._name": "nodo-1,nodo-2,nodo-3"
  }
}'

# Allocate specific index to specific nodos
curl -u elastic:password -X PUT http://localhost:9200/my-index-000001/_settings -H "Content-Type: application/json" -d '{
  "index.routing.allocation.include._name": "nodo-1,nodo-2"
}'

# Habilita rack awareness
curl -u elastic:password -X PUT http://localhost:9200/_cluster/settings -H "Content-Type: application/json" -d '{
  "persistent": {
    "cluster.routing.allocation.awareness.attributes": "rack_id",
    "cluster.routing.allocation.awareness.force.zone.values": "rack1,rack2,rack3"
  }
}'

# Monitorea shard allocation
curl -u elastic:password http://localhost:9200/_cat/shards?v
curl -u elastic:password http://localhost:9200/_cat/allocation?v

Index Lifecycle Gestión

Configura automatic index management:

# Crea ILM policy
curl -u elastic:password -X PUT http://localhost:9200/_ilm/policy/my-policy -H "Content-Type: application/json" -d '{
  "policy": "my-policy",
  "phases": {
    "hot": {
      "min_age": "0ms",
      "actions": {
        "rollover": {
          "max_primary_shard_size": "50GB",
          "max_age": "30d"
        }
      }
    },
    "warm": {
      "min_age": "7d",
      "actions": {
        "set_priority": {
          "priority": 50
        },
        "shrink": {
          "number_of_shards": 1
        }
      }
    },
    "cold": {
      "min_age": "30d",
      "actions": {
        "set_priority": {
          "priority": 0
        },
        "searchable_snapshot": {
          "snapshot_repository": "my-repositorio"
        }
      }
    },
    "delete": {
      "min_age": "90d",
      "actions": {
        "delete": {}
      }
    }
  }
}'

# Apply ILM policy to index template
curl -u elastic:password -X PUT http://localhost:9200/_index_template/my-template -H "Content-Type: application/json" -d '{
  "index_patterns": ["my-logs-*"],
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1,
    "index.lifecycle.name": "my-policy",
    "index.lifecycle.rollover_alias": "my-logs"
  },
  "mappings": {
    "properties": {
      "timestamp": {"type": "date"},
      "message": {"type": "text"}
    }
  }
}'

# View ILM policy
curl -u elastic:password http://localhost:9200/_ilm/policy/my-policy?pretty

# Check index lifecycle status
curl -u elastic:password http://localhost:9200/my-logs-000001/_ilm/explain?pretty

Security and Authentication

Habilita and configure Elasticsearch security:

# Crea users
curl -u elastic:password -X POST http://localhost:9200/_security/user/myuser -H "Content-Type: application/json" -d '{
  "password": "secure_password",
  "roles": ["data_scientist"]
}'

# Crea custom role
curl -u elastic:password -X POST http://localhost:9200/_security/role/my_role -H "Content-Type: application/json" -d '{
  "cluster": ["monitor"],
  "indices": [
    {
      "names": ["my-index-*"],
      "privileges": ["read", "write"]
    }
  ]
}'

# Assign role to user
curl -u elastic:password -X POST http://localhost:9200/_security/user/myuser/_roles -H "Content-Type: application/json" -d '{
  "roles": ["my_role"]
}'

# List users
curl -u elastic:password http://localhost:9200/_security/user?pretty

# Habilita HTTPS in configuration
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.http.ssl.enabled: true
xpack.security.transport.ssl.key: /etc/elasticsearch/certs/nodo-1-key.pem
xpack.security.transport.ssl.certificate: /etc/elasticsearch/certs/nodo-1.pem
xpack.security.http.ssl.key: /etc/elasticsearch/certs/nodo-1-key.pem
xpack.security.http.ssl.certificate: /etc/elasticsearch/certs/nodo-1.pem

Monitoreo and Metrics

Monitorea cluster health and performance:

# Cluster health overview
curl -u elastic:password http://localhost:9200/_cluster/health?pretty

# Detailed cluster status
curl -u elastic:password http://localhost:9200/_cluster/state?pretty

# Nodo statistics
curl -u elastic:password http://localhost:9200/_nodes/stats?pretty

# Index statistics
curl -u elastic:password http://localhost:9200/_cat/indices?v

# Shard distribution
curl -u elastic:password http://localhost:9200/_cat/shards?v

# Tarea management
curl -u elastic:password http://localhost:9200/_tasks?pretty

# Monitorea hot threads (performance bottlenecks)
curl -u elastic:password http://localhost:9200/_nodes/hot_threads?pretty

# Thread pool statistics
curl -u elastic:password http://localhost:9200/_nodes/stats/thread_pool?pretty

# Memory usage per nodo
curl -u elastic:password http://localhost:9200/_nodes/stats/jvm?pretty

Set up monitoring with X-Pack:

# Check license
curl -u elastic:password http://localhost:9200/_license?pretty

# Habilita monitoring
curl -u elastic:password -X PUT http://localhost:9200/_cluster/settings -H "Content-Type: application/json" -d '{
  "persistent": {
    "xpack.monitoring.collection.enabled": true
  }
}'

# View monitoring data
curl -u elastic:password http://localhost:9200/.monitoring-es-*/search?pretty

Respalda and Snapshots

Crea and manage snapshots:

# Register snapshot repositorio
curl -u elastic:password -X PUT http://localhost:9200/_snapshot/my-backup -H "Content-Type: application/json" -d '{
  "type": "fs",
  "settings": {
    "location": "/var/elasticsearch/backups/my-backup"
  }
}'

# Or use S3 repositorio
curl -u elastic:password -X PUT http://localhost:9200/_snapshot/s3-backup -H "Content-Type: application/json" -d '{
  "type": "s3",
  "settings": {
    "bucket": "my-backup-bucket",
    "region": "us-east-1",
    "base_path": "elasticsearch"
  }
}'

# Crea snapshot
curl -u elastic:password -X PUT http://localhost:9200/_snapshot/my-backup/snapshot-001?wait_for_completion=false -H "Content-Type: application/json" -d '{
  "indices": "my-index-*",
  "ignore_unavailable": true,
  "include_global_state": true
}'

# View snapshot status
curl -u elastic:password http://localhost:9200/_snapshot/my-backup/snapshot-001?pretty

# List snapshots
curl -u elastic:password http://localhost:9200/_snapshot/my-backup/_all?pretty

# Restaura from snapshot
curl -u elastic:password -X POST http://localhost:9200/_snapshot/my-backup/snapshot-001/_restore -H "Content-Type: application/json" -d '{
  "indices": "my-index-*",
  "include_global_state": false,
  "rename_pattern": "(.+)",
  "rename_replacement": "$1-restored"
}'

Performance Tuning

Optimiza Elasticsearch for your workload:

# Adjust refresh interval for better indexing performance
curl -u elastic:password -X PUT http://localhost:9200/my-index/_settings -H "Content-Type: application/json" -d '{
  "index": {
    "refresh_interval": "30s"
  }
}'

# Configura query cache
curl -u elastic:password -X PUT http://localhost:9200/my-index/_settings -H "Content-Type: application/json" -d '{
  "index": {
    "queries.cache.enabled": true,
    "max_result_window": 100000
  }
}'

# Deshabilita unnecessary features
curl -u elastic:password -X PUT http://localhost:9200/my-index/_settings -H "Content-Type: application/json" -d '{
  "index": {
    "_source.enabled": false
  }
}'

# Configura merge settings
curl -u elastic:password -X PUT http://localhost:9200/my-index/_settings -H "Content-Type: application/json" -d '{
  "index.merge.scheduler.max_thread_count": 2,
  "index.merge.scheduler.auto_throttle": true
}'

# Monitorea performance
curl -u elastic:password http://localhost:9200/_nodes/stats/indices?pretty
curl -u elastic:password http://localhost:9200/_nodes/stats/fs?pretty

Conclusión

Elasticsearch proporciona a powerful, scalable search and analytics platform suitable for applications requiring real-time indexing and complex queries across massive datasets. Its built-in clustering, replication, and automatic failover asegúrate de que high availability while index lifecycle management enables cost-effective data retention at scale. By properly configuring nodo discovery, shard allocation, security, and monitoring, you can operate production Elasticsearch clusters that deliver consistent performance and reliability for search-driven applications.