Elasticsearch and Logstash Installation
Introduction
The ELK Stack (Elasticsearch, Logstash, Kibana) has become the industry standard for centralized logging, log analysis, and data visualization. Elasticsearch provides powerful full-text search and analytics capabilities, while Logstash acts as a data processing pipeline that ingests, transforms, and forwards logs to Elasticsearch for storage and analysis.
Centralized logging is crucial for modern infrastructure management, enabling you to aggregate logs from multiple servers, applications, and services into a single searchable repository. This centralization simplifies troubleshooting, security analysis, compliance reporting, and operational insights across distributed systems.
This comprehensive guide walks you through installing and configuring both Elasticsearch and Logstash on Linux servers. You'll learn how to set up a production-ready logging infrastructure, configure data ingestion pipelines, optimize performance, implement security, and integrate with various log sources. Whether you're building a new logging system or migrating from legacy solutions, this guide provides the foundation for effective log management.
Prerequisites
Before installing Elasticsearch and Logstash, ensure you have:
- A Linux server (Ubuntu 20.04/22.04, Debian 10/11, CentOS 7/8, Rocky Linux 8/9)
- Root or sudo access for installation and configuration
- Java 11 or Java 17 (OpenJDK or Oracle JDK)
- Minimum 4 GB RAM (8 GB recommended for production)
- Minimum 10 GB disk space (scales with log volume)
- Basic understanding of JSON and log formats
Recommended System Requirements:
- Development: 4 GB RAM, 2 CPU cores, 20 GB disk
- Production: 16 GB RAM, 4-8 CPU cores, 100+ GB SSD storage
- High Volume: 32+ GB RAM, 8+ CPU cores, 500+ GB SSD storage
Installing Java
Both Elasticsearch and Logstash require Java. Let's install OpenJDK 11 or 17.
On Ubuntu/Debian
# Update package repository
sudo apt update
# Install OpenJDK 11
sudo apt install openjdk-11-jdk -y
# Or install OpenJDK 17
sudo apt install openjdk-17-jdk -y
# Verify installation
java -version
# Set JAVA_HOME
echo 'export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64' >> ~/.bashrc
source ~/.bashrc
On CentOS/Rocky Linux
# Install OpenJDK 11
sudo yum install java-11-openjdk java-11-openjdk-devel -y
# Or install OpenJDK 17
sudo dnf install java-17-openjdk java-17-openjdk-devel -y
# Verify installation
java -version
# Set JAVA_HOME
echo 'export JAVA_HOME=/usr/lib/jvm/java-11-openjdk' >> ~/.bashrc
source ~/.bashrc
Installing Elasticsearch
Method 1: Official Repository (Recommended)
On Ubuntu/Debian:
# Import Elasticsearch GPG key
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elasticsearch-keyring.gpg
# Add Elasticsearch repository
echo "deb [signed-by=/usr/share/keyrings/elasticsearch-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list
# Update package index
sudo apt update
# Install Elasticsearch
sudo apt install elasticsearch -y
On CentOS/Rocky Linux:
# Import Elasticsearch GPG key
sudo rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
# Create repository file
sudo tee /etc/yum.repos.d/elasticsearch.repo <<EOF
[elasticsearch]
name=Elasticsearch repository for 8.x packages
baseurl=https://artifacts.elastic.co/packages/8.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
EOF
# Install Elasticsearch
sudo yum install elasticsearch -y
Method 2: Direct Download
# Download Elasticsearch (check for latest version)
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.11.0-linux-x86_64.tar.gz
# Extract archive
tar -xzf elasticsearch-8.11.0-linux-x86_64.tar.gz
# Move to /opt
sudo mv elasticsearch-8.11.0 /opt/elasticsearch
# Create elasticsearch user
sudo useradd -r -s /bin/false elasticsearch
# Set ownership
sudo chown -R elasticsearch:elasticsearch /opt/elasticsearch
Initial Elasticsearch Configuration
Configure JVM heap size:
# Edit JVM options
sudo nano /etc/elasticsearch/jvm.options
# Set heap size (typically 50% of available RAM, max 32GB)
-Xms4g
-Xmx4g
Basic Elasticsearch configuration:
# Edit main configuration
sudo nano /etc/elasticsearch/elasticsearch.yml
# Cluster name
cluster.name: my-logging-cluster
# Node name
node.name: node-1
# Network settings
network.host: 0.0.0.0
http.port: 9200
# Discovery settings (single node for development)
discovery.type: single-node
# Data and logs paths
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
# Disable security for initial setup (enable in production)
xpack.security.enabled: false
xpack.security.enrollment.enabled: false
Start Elasticsearch:
# Enable and start Elasticsearch
sudo systemctl daemon-reload
sudo systemctl enable elasticsearch
sudo systemctl start elasticsearch
# Check status
sudo systemctl status elasticsearch
# View logs
sudo journalctl -u elasticsearch -f
Verify Elasticsearch is running:
# Test connection (wait 30-60 seconds for startup)
curl -X GET "localhost:9200/"
# Expected output:
# {
# "name" : "node-1",
# "cluster_name" : "my-logging-cluster",
# "version" : { ... },
# ...
# }
# Check cluster health
curl -X GET "localhost:9200/_cluster/health?pretty"
Installing Logstash
Method 1: Official Repository (Recommended)
The Elasticsearch repository already includes Logstash.
On Ubuntu/Debian:
# Repository already added in Elasticsearch installation
sudo apt update
sudo apt install logstash -y
On CentOS/Rocky Linux:
# Repository already added in Elasticsearch installation
sudo yum install logstash -y
Method 2: Direct Download
# Download Logstash
wget https://artifacts.elastic.co/downloads/logstash/logstash-8.11.0-linux-x86_64.tar.gz
# Extract archive
tar -xzf logstash-8.11.0-linux-x86_64.tar.gz
# Move to /opt
sudo mv logstash-8.11.0 /opt/logstash
# Create logstash user
sudo useradd -r -s /bin/false logstash
# Set ownership
sudo chown -R logstash:logstash /opt/logstash
Logstash Configuration
Configure JVM heap:
# Edit JVM options
sudo nano /etc/logstash/jvm.options
# Set heap size (typically 25% of available RAM)
-Xms2g
-Xmx2g
Main Logstash configuration:
# Edit logstash.yml
sudo nano /etc/logstash/logstash.yml
# Node name
node.name: logstash-1
# Data path
path.data: /var/lib/logstash
# Pipeline configuration path
path.config: /etc/logstash/conf.d
# Log path
path.logs: /var/log/logstash
# Pipeline settings
pipeline.workers: 2
pipeline.batch.size: 125
pipeline.batch.delay: 50
# Monitoring
monitoring.enabled: false
Creating Logstash Pipeline
Create pipeline configuration directory:
sudo mkdir -p /etc/logstash/conf.d
Basic pipeline configuration (syslog example):
sudo nano /etc/logstash/conf.d/syslog-pipeline.conf
# Input section
input {
# Syslog input
tcp {
port => 5000
type => "syslog"
}
udp {
port => 5000
type => "syslog"
}
# File input
file {
path => "/var/log/syslog"
start_position => "beginning"
sincedb_path => "/dev/null"
type => "syslog-file"
}
}
# Filter section
filter {
if [type] == "syslog" {
grok {
match => { "message" => "%{SYSLOGLINE}" }
}
date {
match => [ "timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]
}
}
}
# Output section
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "syslog-%{+YYYY.MM.dd}"
}
# Also output to stdout for debugging
stdout {
codec => rubydebug
}
}
Start Logstash:
# Enable and start Logstash
sudo systemctl enable logstash
sudo systemctl start logstash
# Check status
sudo systemctl status logstash
# View logs
sudo journalctl -u logstash -f
Test Logstash pipeline:
# Send test log via TCP
echo "<14>Jan 11 10:00:00 testhost test: This is a test message" | nc localhost 5000
# Check if data reached Elasticsearch
curl -X GET "localhost:9200/syslog-*/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"match_all": {}
}
}
'
Advanced Logstash Pipelines
Apache/Nginx Access Log Pipeline
sudo nano /etc/logstash/conf.d/apache-access.conf
input {
file {
path => "/var/log/apache2/access.log"
start_position => "beginning"
sincedb_path => "/var/lib/logstash/sincedb/apache-access"
type => "apache-access"
}
file {
path => "/var/log/nginx/access.log"
start_position => "beginning"
sincedb_path => "/var/lib/logstash/sincedb/nginx-access"
type => "nginx-access"
}
}
filter {
if [type] == "apache-access" {
grok {
match => { "message" => '%{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-)' }
}
date {
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
}
geoip {
source => "clientip"
}
useragent {
source => "agent"
target => "user_agent"
}
}
if [type] == "nginx-access" {
grok {
match => { "message" => '%{IPORHOST:clientip} - %{USER:ident} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-) "(?:%{DATA:referrer}|-)" "(?:%{DATA:agent}|-)"' }
}
date {
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
}
geoip {
source => "clientip"
}
}
}
output {
if [type] == "apache-access" {
elasticsearch {
hosts => ["localhost:9200"]
index => "apache-access-%{+YYYY.MM.dd}"
}
}
if [type] == "nginx-access" {
elasticsearch {
hosts => ["localhost:9200"]
index => "nginx-access-%{+YYYY.MM.dd}"
}
}
}
Application Log Pipeline (JSON)
sudo nano /etc/logstash/conf.d/app-json.conf
input {
file {
path => "/var/log/myapp/app.json"
codec => "json"
type => "app-json"
}
}
filter {
# Parse JSON fields
json {
source => "message"
}
# Add custom fields
mutate {
add_field => {
"application" => "myapp"
"environment" => "production"
}
}
# Convert log level to lowercase
mutate {
lowercase => ["level"]
}
# Parse timestamp if present
date {
match => ["timestamp", "ISO8601"]
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "app-%{application}-%{+YYYY.MM.dd}"
}
}
Multi-Pipeline Configuration
sudo nano /etc/logstash/pipelines.yml
- pipeline.id: syslog-pipeline
path.config: "/etc/logstash/conf.d/syslog-pipeline.conf"
pipeline.workers: 2
- pipeline.id: apache-pipeline
path.config: "/etc/logstash/conf.d/apache-access.conf"
pipeline.workers: 1
- pipeline.id: app-pipeline
path.config: "/etc/logstash/conf.d/app-json.conf"
pipeline.workers: 1
# Restart Logstash to apply
sudo systemctl restart logstash
Performance Tuning
Elasticsearch Performance
Optimize JVM settings:
sudo nano /etc/elasticsearch/jvm.options
# Heap size (50% of RAM, max 32GB)
-Xms16g
-Xmx16g
# GC settings
-XX:+UseG1GC
-XX:G1ReservePercent=25
-XX:InitiatingHeapOccupancyPercent=30
Optimize Elasticsearch configuration:
sudo nano /etc/elasticsearch/elasticsearch.yml
# Thread pools
thread_pool.write.queue_size: 1000
thread_pool.search.queue_size: 1000
# Indexing settings
indices.memory.index_buffer_size: 30%
# Query cache
indices.queries.cache.size: 15%
# Fielddata cache
indices.fielddata.cache.size: 40%
# Refresh interval (increase for better indexing performance)
indices.refresh_interval: 30s
Logstash Performance
Optimize pipeline workers:
sudo nano /etc/logstash/logstash.yml
# Increase workers (one per CPU core)
pipeline.workers: 4
# Increase batch size
pipeline.batch.size: 250
# Adjust batch delay
pipeline.batch.delay: 50
# Output workers
pipeline.output.workers: 2
Pipeline-specific tuning:
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "logs-%{+YYYY.MM.dd}"
# Bulk settings
flush_size => 500
idle_flush_time => 5
}
}
Security Configuration
Enable Elasticsearch Security
# Edit Elasticsearch configuration
sudo nano /etc/elasticsearch/elasticsearch.yml
# Enable X-Pack security
xpack.security.enabled: true
xpack.security.enrollment.enabled: true
# TLS for HTTP
xpack.security.http.ssl:
enabled: true
keystore.path: certs/http.p12
# TLS for transport
xpack.security.transport.ssl:
enabled: true
verification_mode: certificate
keystore.path: certs/transport.p12
truststore.path: certs/transport.p12
Setup passwords:
# Auto-generate passwords
sudo /usr/share/elasticsearch/bin/elasticsearch-setup-passwords auto
# Or set passwords interactively
sudo /usr/share/elasticsearch/bin/elasticsearch-setup-passwords interactive
Save the generated passwords:
- elastic (superuser)
- kibana_system
- logstash_system
- beats_system
Configure Logstash with Security
sudo nano /etc/logstash/conf.d/secure-output.conf
output {
elasticsearch {
hosts => ["https://localhost:9200"]
user => "logstash_system"
password => "YOUR_PASSWORD_HERE"
cacert => "/etc/elasticsearch/certs/http_ca.crt"
index => "logs-%{+YYYY.MM.dd}"
}
}
Firewall Configuration
# UFW (Ubuntu/Debian)
sudo ufw allow 9200/tcp # Elasticsearch HTTP
sudo ufw allow 5000/tcp # Logstash input
sudo ufw allow 5000/udp # Logstash syslog UDP
# firewalld (CentOS/Rocky)
sudo firewall-cmd --permanent --add-port=9200/tcp
sudo firewall-cmd --permanent --add-port=5000/tcp
sudo firewall-cmd --permanent --add-port=5000/udp
sudo firewall-cmd --reload
Monitoring and Maintenance
Index Management
View indices:
# List all indices
curl -X GET "localhost:9200/_cat/indices?v"
# Check index size
curl -X GET "localhost:9200/_cat/indices?v&s=store.size:desc"
# Get specific index info
curl -X GET "localhost:9200/syslog-2024.01.11/_stats?pretty"
Delete old indices:
# Delete indices older than 30 days
curl -X DELETE "localhost:9200/syslog-2023.*"
# Or use curator (recommended for automation)
sudo pip3 install elasticsearch-curator
Create curator config:
sudo nano /etc/curator/curator.yml
client:
hosts:
- localhost
port: 9200
timeout: 30
logging:
loglevel: INFO
logfile: /var/log/curator.log
Curator action file:
sudo nano /etc/curator/delete-old-logs.yml
actions:
1:
action: delete_indices
description: Delete indices older than 30 days
options:
ignore_empty_list: True
filters:
- filtertype: pattern
kind: prefix
value: syslog-
- filtertype: age
source: name
direction: older
timestring: '%Y.%m.%d'
unit: days
unit_count: 30
Run curator:
curator --config /etc/curator/curator.yml /etc/curator/delete-old-logs.yml
Monitor Logstash
Check Logstash statistics:
# Node stats
curl -X GET "localhost:9600/_node/stats?pretty"
# Pipeline stats
curl -X GET "localhost:9600/_node/stats/pipelines?pretty"
# Hot threads
curl -X GET "localhost:9600/_node/hot_threads?pretty"
Troubleshooting
Common Elasticsearch Issues
Check Elasticsearch logs:
sudo journalctl -u elasticsearch -f
# Or view log files
sudo tail -f /var/log/elasticsearch/my-logging-cluster.log
Check cluster health:
curl -X GET "localhost:9200/_cluster/health?pretty"
curl -X GET "localhost:9200/_cluster/stats?pretty"
Clear cache if needed:
curl -X POST "localhost:9200/_cache/clear?pretty"
Common Logstash Issues
Test pipeline configuration:
sudo /usr/share/logstash/bin/logstash -t -f /etc/logstash/conf.d/
Run Logstash in debug mode:
sudo /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/ --log.level=debug
Check for parsing errors:
sudo journalctl -u logstash | grep -i "error\|exception"
Conclusion
Elasticsearch and Logstash form a powerful combination for centralized logging, providing scalable search, analytics, and data processing capabilities. By following this guide, you've installed and configured a production-ready logging infrastructure capable of handling logs from multiple sources.
Key takeaways:
- Scalable architecture - Elasticsearch provides distributed storage and search
- Flexible data processing - Logstash transforms and enriches log data
- Multiple input sources - Ingest logs from files, syslog, applications, and more
- Performance tuning - Optimize JVM, pipeline workers, and batch sizes
- Security - Enable authentication, encryption, and access controls
Best practices for production:
- Allocate appropriate resources based on log volume
- Implement index lifecycle management
- Enable security features and authentication
- Monitor cluster health and performance
- Regular backups of Elasticsearch data
- Use dedicated nodes for different roles in larger deployments
- Implement proper log retention policies
- Document your pipeline configurations
The ELK stack continues to evolve with new features and improvements. Consider integrating Kibana for visualization and Beats for lightweight log shipping to complete your logging infrastructure and maximize operational insights.


