Elasticsearch and Logstash Installation

Introduction

The ELK Stack (Elasticsearch, Logstash, Kibana) has become the industry standard for centralized logging, log analysis, and data visualization. Elasticsearch provides powerful full-text search and analytics capabilities, while Logstash acts as a data processing pipeline that ingests, transforms, and forwards logs to Elasticsearch for storage and analysis.

Centralized logging is crucial for modern infrastructure management, enabling you to aggregate logs from multiple servers, applications, and services into a single searchable repository. This centralization simplifies troubleshooting, security analysis, compliance reporting, and operational insights across distributed systems.

This comprehensive guide walks you through installing and configuring both Elasticsearch and Logstash on Linux servers. You'll learn how to set up a production-ready logging infrastructure, configure data ingestion pipelines, optimize performance, implement security, and integrate with various log sources. Whether you're building a new logging system or migrating from legacy solutions, this guide provides the foundation for effective log management.

Prerequisites

Before installing Elasticsearch and Logstash, ensure you have:

  • A Linux server (Ubuntu 20.04/22.04, Debian 10/11, CentOS 7/8, Rocky Linux 8/9)
  • Root or sudo access for installation and configuration
  • Java 11 or Java 17 (OpenJDK or Oracle JDK)
  • Minimum 4 GB RAM (8 GB recommended for production)
  • Minimum 10 GB disk space (scales with log volume)
  • Basic understanding of JSON and log formats

Recommended System Requirements:

  • Development: 4 GB RAM, 2 CPU cores, 20 GB disk
  • Production: 16 GB RAM, 4-8 CPU cores, 100+ GB SSD storage
  • High Volume: 32+ GB RAM, 8+ CPU cores, 500+ GB SSD storage

Installing Java

Both Elasticsearch and Logstash require Java. Let's install OpenJDK 11 or 17.

On Ubuntu/Debian

# Update package repository
sudo apt update

# Install OpenJDK 11
sudo apt install openjdk-11-jdk -y

# Or install OpenJDK 17
sudo apt install openjdk-17-jdk -y

# Verify installation
java -version

# Set JAVA_HOME
echo 'export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64' >> ~/.bashrc
source ~/.bashrc

On CentOS/Rocky Linux

# Install OpenJDK 11
sudo yum install java-11-openjdk java-11-openjdk-devel -y

# Or install OpenJDK 17
sudo dnf install java-17-openjdk java-17-openjdk-devel -y

# Verify installation
java -version

# Set JAVA_HOME
echo 'export JAVA_HOME=/usr/lib/jvm/java-11-openjdk' >> ~/.bashrc
source ~/.bashrc

Installing Elasticsearch

Method 1: Official Repository (Recommended)

On Ubuntu/Debian:

# Import Elasticsearch GPG key
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elasticsearch-keyring.gpg

# Add Elasticsearch repository
echo "deb [signed-by=/usr/share/keyrings/elasticsearch-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list

# Update package index
sudo apt update

# Install Elasticsearch
sudo apt install elasticsearch -y

On CentOS/Rocky Linux:

# Import Elasticsearch GPG key
sudo rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch

# Create repository file
sudo tee /etc/yum.repos.d/elasticsearch.repo <<EOF
[elasticsearch]
name=Elasticsearch repository for 8.x packages
baseurl=https://artifacts.elastic.co/packages/8.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
EOF

# Install Elasticsearch
sudo yum install elasticsearch -y

Method 2: Direct Download

# Download Elasticsearch (check for latest version)
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.11.0-linux-x86_64.tar.gz

# Extract archive
tar -xzf elasticsearch-8.11.0-linux-x86_64.tar.gz

# Move to /opt
sudo mv elasticsearch-8.11.0 /opt/elasticsearch

# Create elasticsearch user
sudo useradd -r -s /bin/false elasticsearch

# Set ownership
sudo chown -R elasticsearch:elasticsearch /opt/elasticsearch

Initial Elasticsearch Configuration

Configure JVM heap size:

# Edit JVM options
sudo nano /etc/elasticsearch/jvm.options

# Set heap size (typically 50% of available RAM, max 32GB)
-Xms4g
-Xmx4g

Basic Elasticsearch configuration:

# Edit main configuration
sudo nano /etc/elasticsearch/elasticsearch.yml
# Cluster name
cluster.name: my-logging-cluster

# Node name
node.name: node-1

# Network settings
network.host: 0.0.0.0
http.port: 9200

# Discovery settings (single node for development)
discovery.type: single-node

# Data and logs paths
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch

# Disable security for initial setup (enable in production)
xpack.security.enabled: false
xpack.security.enrollment.enabled: false

Start Elasticsearch:

# Enable and start Elasticsearch
sudo systemctl daemon-reload
sudo systemctl enable elasticsearch
sudo systemctl start elasticsearch

# Check status
sudo systemctl status elasticsearch

# View logs
sudo journalctl -u elasticsearch -f

Verify Elasticsearch is running:

# Test connection (wait 30-60 seconds for startup)
curl -X GET "localhost:9200/"

# Expected output:
# {
#   "name" : "node-1",
#   "cluster_name" : "my-logging-cluster",
#   "version" : { ... },
#   ...
# }

# Check cluster health
curl -X GET "localhost:9200/_cluster/health?pretty"

Installing Logstash

Method 1: Official Repository (Recommended)

The Elasticsearch repository already includes Logstash.

On Ubuntu/Debian:

# Repository already added in Elasticsearch installation
sudo apt update
sudo apt install logstash -y

On CentOS/Rocky Linux:

# Repository already added in Elasticsearch installation
sudo yum install logstash -y

Method 2: Direct Download

# Download Logstash
wget https://artifacts.elastic.co/downloads/logstash/logstash-8.11.0-linux-x86_64.tar.gz

# Extract archive
tar -xzf logstash-8.11.0-linux-x86_64.tar.gz

# Move to /opt
sudo mv logstash-8.11.0 /opt/logstash

# Create logstash user
sudo useradd -r -s /bin/false logstash

# Set ownership
sudo chown -R logstash:logstash /opt/logstash

Logstash Configuration

Configure JVM heap:

# Edit JVM options
sudo nano /etc/logstash/jvm.options

# Set heap size (typically 25% of available RAM)
-Xms2g
-Xmx2g

Main Logstash configuration:

# Edit logstash.yml
sudo nano /etc/logstash/logstash.yml
# Node name
node.name: logstash-1

# Data path
path.data: /var/lib/logstash

# Pipeline configuration path
path.config: /etc/logstash/conf.d

# Log path
path.logs: /var/log/logstash

# Pipeline settings
pipeline.workers: 2
pipeline.batch.size: 125
pipeline.batch.delay: 50

# Monitoring
monitoring.enabled: false

Creating Logstash Pipeline

Create pipeline configuration directory:

sudo mkdir -p /etc/logstash/conf.d

Basic pipeline configuration (syslog example):

sudo nano /etc/logstash/conf.d/syslog-pipeline.conf
# Input section
input {
  # Syslog input
  tcp {
    port => 5000
    type => "syslog"
  }

  udp {
    port => 5000
    type => "syslog"
  }

  # File input
  file {
    path => "/var/log/syslog"
    start_position => "beginning"
    sincedb_path => "/dev/null"
    type => "syslog-file"
  }
}

# Filter section
filter {
  if [type] == "syslog" {
    grok {
      match => { "message" => "%{SYSLOGLINE}" }
    }

    date {
      match => [ "timestamp", "MMM  d HH:mm:ss", "MMM dd HH:mm:ss" ]
    }
  }
}

# Output section
output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "syslog-%{+YYYY.MM.dd}"
  }

  # Also output to stdout for debugging
  stdout {
    codec => rubydebug
  }
}

Start Logstash:

# Enable and start Logstash
sudo systemctl enable logstash
sudo systemctl start logstash

# Check status
sudo systemctl status logstash

# View logs
sudo journalctl -u logstash -f

Test Logstash pipeline:

# Send test log via TCP
echo "<14>Jan 11 10:00:00 testhost test: This is a test message" | nc localhost 5000

# Check if data reached Elasticsearch
curl -X GET "localhost:9200/syslog-*/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match_all": {}
  }
}
'

Advanced Logstash Pipelines

Apache/Nginx Access Log Pipeline

sudo nano /etc/logstash/conf.d/apache-access.conf
input {
  file {
    path => "/var/log/apache2/access.log"
    start_position => "beginning"
    sincedb_path => "/var/lib/logstash/sincedb/apache-access"
    type => "apache-access"
  }

  file {
    path => "/var/log/nginx/access.log"
    start_position => "beginning"
    sincedb_path => "/var/lib/logstash/sincedb/nginx-access"
    type => "nginx-access"
  }
}

filter {
  if [type] == "apache-access" {
    grok {
      match => { "message" => '%{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-)' }
    }

    date {
      match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
    }

    geoip {
      source => "clientip"
    }

    useragent {
      source => "agent"
      target => "user_agent"
    }
  }

  if [type] == "nginx-access" {
    grok {
      match => { "message" => '%{IPORHOST:clientip} - %{USER:ident} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-) "(?:%{DATA:referrer}|-)" "(?:%{DATA:agent}|-)"' }
    }

    date {
      match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
    }

    geoip {
      source => "clientip"
    }
  }
}

output {
  if [type] == "apache-access" {
    elasticsearch {
      hosts => ["localhost:9200"]
      index => "apache-access-%{+YYYY.MM.dd}"
    }
  }

  if [type] == "nginx-access" {
    elasticsearch {
      hosts => ["localhost:9200"]
      index => "nginx-access-%{+YYYY.MM.dd}"
    }
  }
}

Application Log Pipeline (JSON)

sudo nano /etc/logstash/conf.d/app-json.conf
input {
  file {
    path => "/var/log/myapp/app.json"
    codec => "json"
    type => "app-json"
  }
}

filter {
  # Parse JSON fields
  json {
    source => "message"
  }

  # Add custom fields
  mutate {
    add_field => {
      "application" => "myapp"
      "environment" => "production"
    }
  }

  # Convert log level to lowercase
  mutate {
    lowercase => ["level"]
  }

  # Parse timestamp if present
  date {
    match => ["timestamp", "ISO8601"]
  }
}

output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "app-%{application}-%{+YYYY.MM.dd}"
  }
}

Multi-Pipeline Configuration

sudo nano /etc/logstash/pipelines.yml
- pipeline.id: syslog-pipeline
  path.config: "/etc/logstash/conf.d/syslog-pipeline.conf"
  pipeline.workers: 2

- pipeline.id: apache-pipeline
  path.config: "/etc/logstash/conf.d/apache-access.conf"
  pipeline.workers: 1

- pipeline.id: app-pipeline
  path.config: "/etc/logstash/conf.d/app-json.conf"
  pipeline.workers: 1
# Restart Logstash to apply
sudo systemctl restart logstash

Performance Tuning

Elasticsearch Performance

Optimize JVM settings:

sudo nano /etc/elasticsearch/jvm.options
# Heap size (50% of RAM, max 32GB)
-Xms16g
-Xmx16g

# GC settings
-XX:+UseG1GC
-XX:G1ReservePercent=25
-XX:InitiatingHeapOccupancyPercent=30

Optimize Elasticsearch configuration:

sudo nano /etc/elasticsearch/elasticsearch.yml
# Thread pools
thread_pool.write.queue_size: 1000
thread_pool.search.queue_size: 1000

# Indexing settings
indices.memory.index_buffer_size: 30%

# Query cache
indices.queries.cache.size: 15%

# Fielddata cache
indices.fielddata.cache.size: 40%

# Refresh interval (increase for better indexing performance)
indices.refresh_interval: 30s

Logstash Performance

Optimize pipeline workers:

sudo nano /etc/logstash/logstash.yml
# Increase workers (one per CPU core)
pipeline.workers: 4

# Increase batch size
pipeline.batch.size: 250

# Adjust batch delay
pipeline.batch.delay: 50

# Output workers
pipeline.output.workers: 2

Pipeline-specific tuning:

output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "logs-%{+YYYY.MM.dd}"
    # Bulk settings
    flush_size => 500
    idle_flush_time => 5
  }
}

Security Configuration

Enable Elasticsearch Security

# Edit Elasticsearch configuration
sudo nano /etc/elasticsearch/elasticsearch.yml
# Enable X-Pack security
xpack.security.enabled: true
xpack.security.enrollment.enabled: true

# TLS for HTTP
xpack.security.http.ssl:
  enabled: true
  keystore.path: certs/http.p12

# TLS for transport
xpack.security.transport.ssl:
  enabled: true
  verification_mode: certificate
  keystore.path: certs/transport.p12
  truststore.path: certs/transport.p12

Setup passwords:

# Auto-generate passwords
sudo /usr/share/elasticsearch/bin/elasticsearch-setup-passwords auto

# Or set passwords interactively
sudo /usr/share/elasticsearch/bin/elasticsearch-setup-passwords interactive

Save the generated passwords:

  • elastic (superuser)
  • kibana_system
  • logstash_system
  • beats_system

Configure Logstash with Security

sudo nano /etc/logstash/conf.d/secure-output.conf
output {
  elasticsearch {
    hosts => ["https://localhost:9200"]
    user => "logstash_system"
    password => "YOUR_PASSWORD_HERE"
    cacert => "/etc/elasticsearch/certs/http_ca.crt"
    index => "logs-%{+YYYY.MM.dd}"
  }
}

Firewall Configuration

# UFW (Ubuntu/Debian)
sudo ufw allow 9200/tcp  # Elasticsearch HTTP
sudo ufw allow 5000/tcp  # Logstash input
sudo ufw allow 5000/udp  # Logstash syslog UDP

# firewalld (CentOS/Rocky)
sudo firewall-cmd --permanent --add-port=9200/tcp
sudo firewall-cmd --permanent --add-port=5000/tcp
sudo firewall-cmd --permanent --add-port=5000/udp
sudo firewall-cmd --reload

Monitoring and Maintenance

Index Management

View indices:

# List all indices
curl -X GET "localhost:9200/_cat/indices?v"

# Check index size
curl -X GET "localhost:9200/_cat/indices?v&s=store.size:desc"

# Get specific index info
curl -X GET "localhost:9200/syslog-2024.01.11/_stats?pretty"

Delete old indices:

# Delete indices older than 30 days
curl -X DELETE "localhost:9200/syslog-2023.*"

# Or use curator (recommended for automation)
sudo pip3 install elasticsearch-curator

Create curator config:

sudo nano /etc/curator/curator.yml
client:
  hosts:
    - localhost
  port: 9200
  timeout: 30

logging:
  loglevel: INFO
  logfile: /var/log/curator.log

Curator action file:

sudo nano /etc/curator/delete-old-logs.yml
actions:
  1:
    action: delete_indices
    description: Delete indices older than 30 days
    options:
      ignore_empty_list: True
    filters:
    - filtertype: pattern
      kind: prefix
      value: syslog-
    - filtertype: age
      source: name
      direction: older
      timestring: '%Y.%m.%d'
      unit: days
      unit_count: 30

Run curator:

curator --config /etc/curator/curator.yml /etc/curator/delete-old-logs.yml

Monitor Logstash

Check Logstash statistics:

# Node stats
curl -X GET "localhost:9600/_node/stats?pretty"

# Pipeline stats
curl -X GET "localhost:9600/_node/stats/pipelines?pretty"

# Hot threads
curl -X GET "localhost:9600/_node/hot_threads?pretty"

Troubleshooting

Common Elasticsearch Issues

Check Elasticsearch logs:

sudo journalctl -u elasticsearch -f

# Or view log files
sudo tail -f /var/log/elasticsearch/my-logging-cluster.log

Check cluster health:

curl -X GET "localhost:9200/_cluster/health?pretty"
curl -X GET "localhost:9200/_cluster/stats?pretty"

Clear cache if needed:

curl -X POST "localhost:9200/_cache/clear?pretty"

Common Logstash Issues

Test pipeline configuration:

sudo /usr/share/logstash/bin/logstash -t -f /etc/logstash/conf.d/

Run Logstash in debug mode:

sudo /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/ --log.level=debug

Check for parsing errors:

sudo journalctl -u logstash | grep -i "error\|exception"

Conclusion

Elasticsearch and Logstash form a powerful combination for centralized logging, providing scalable search, analytics, and data processing capabilities. By following this guide, you've installed and configured a production-ready logging infrastructure capable of handling logs from multiple sources.

Key takeaways:

  1. Scalable architecture - Elasticsearch provides distributed storage and search
  2. Flexible data processing - Logstash transforms and enriches log data
  3. Multiple input sources - Ingest logs from files, syslog, applications, and more
  4. Performance tuning - Optimize JVM, pipeline workers, and batch sizes
  5. Security - Enable authentication, encryption, and access controls

Best practices for production:

  • Allocate appropriate resources based on log volume
  • Implement index lifecycle management
  • Enable security features and authentication
  • Monitor cluster health and performance
  • Regular backups of Elasticsearch data
  • Use dedicated nodes for different roles in larger deployments
  • Implement proper log retention policies
  • Document your pipeline configurations

The ELK stack continues to evolve with new features and improvements. Consider integrating Kibana for visualization and Beats for lightweight log shipping to complete your logging infrastructure and maximize operational insights.