Linux Performance Optimization: sysctl and Kernel Parameters
Introduction
Linux kernel parameters and sysctl tunables are the foundation of system performance optimization. Understanding and properly configuring these parameters can dramatically improve your server's performance, reduce latency, and increase throughput for demanding applications. Whether you're running high-traffic web servers, database systems, or real-time applications, kernel tuning is essential for extracting maximum performance from your hardware.
The sysctl interface provides a mechanism to read and modify kernel parameters at runtime without recompiling the kernel. This flexibility allows system administrators to test different configurations and immediately observe their impact on system behavior. However, improper configuration can lead to system instability, security vulnerabilities, or degraded performance, making it crucial to understand each parameter before modification.
In this comprehensive guide, we'll explore the most impactful kernel parameters for performance optimization, explain their purpose, demonstrate how to configure them properly, and show measurable performance improvements through real-world benchmarks.
Understanding sysctl and Kernel Parameters
What is sysctl?
The sysctl command is a powerful utility that allows administrators to view and modify kernel parameters dynamically. These parameters control various aspects of the Linux kernel's behavior, including:
- Network stack configuration
- Memory management policies
- File system behavior
- Process scheduling
- Security settings
- Hardware interaction
Parameter Categories
Kernel parameters are organized hierarchically under /proc/sys/ and are grouped into logical categories:
- kernel.: Core kernel behavior
- vm.: Virtual memory management
- net.: Network stack configuration
- fs.: File system parameters
- dev.: Device-specific settings
Viewing Current Configuration
To view all current kernel parameters:
# List all sysctl parameters and their values
sysctl -a
# View specific parameter
sysctl kernel.hostname
sysctl vm.swappiness
# Search for parameters
sysctl -a | grep tcp
Temporary vs Persistent Changes
Changes made with the sysctl command are temporary and will be lost after reboot. For persistent configuration, parameters must be added to /etc/sysctl.conf or files in /etc/sysctl.d/.
Benchmarking Before Optimization
Before making any changes, establish baseline performance metrics to measure improvement objectively.
System Performance Baseline
# CPU and load information
uptime
mpstat 1 10
# Memory usage baseline
free -h
vmstat 1 10
# Disk I/O baseline
iostat -x 1 10
# Network baseline
sar -n DEV 1 10
Application-Specific Benchmarks
# Web server benchmark
ab -n 10000 -c 100 http://localhost/
# Database connections test
mysqlslap --concurrency=100 --iterations=10 --auto-generate-sql
# File system performance
dd if=/dev/zero of=/tmp/testfile bs=1M count=1024 conv=fdatasync
Baseline Results Example
Before optimization (typical default configuration):
- Web server: 500 requests/second, 200ms average response time
- Database: 250 queries/second, 40ms average query time
- Network throughput: 500 Mbps
- System load average: 2.5 (under load)
Critical Kernel Parameters for Performance
Virtual Memory Management (vm.*)
vm.swappiness
Controls how aggressively the kernel swaps memory pages to disk.
# View current value
sysctl vm.swappiness
# Default value: 60 (too aggressive for servers)
# Recommended for servers: 10 or lower
sysctl -w vm.swappiness=10
Impact: Reduces disk I/O by keeping more data in RAM, dramatically improving application performance.
Before/After Example:
- Before (swappiness=60): 500 MB swapped, 15% performance degradation
- After (swappiness=10): 50 MB swapped, 2% performance degradation
vm.dirty_ratio and vm.dirty_background_ratio
Controls when dirty pages (modified data in cache) are written to disk.
# Default values (often too high)
# vm.dirty_ratio = 20 (20% of RAM)
# vm.dirty_background_ratio = 10 (10% of RAM)
# Recommended for better I/O consistency
sysctl -w vm.dirty_ratio=15
sysctl -w vm.dirty_background_ratio=5
Impact: Prevents large I/O spikes and provides more consistent disk write performance.
vm.vfs_cache_pressure
Controls tendency to reclaim memory used for caching directory and inode objects.
# Default: 100
# Lower values preserve cache longer
sysctl -w vm.vfs_cache_pressure=50
Impact: Improves file system performance by maintaining metadata cache longer.
vm.min_free_kbytes
Reserves minimum amount of free memory for system operations.
# Calculate based on system RAM (typically 0.5-1% of total RAM)
# For 16GB RAM system:
sysctl -w vm.min_free_kbytes=131072 # 128 MB
Impact: Prevents out-of-memory situations and improves system stability under load.
Network Stack Optimization (net.*)
TCP Buffer Sizes
# Increase TCP read/write buffers
sysctl -w net.core.rmem_max=134217728 # 128 MB
sysctl -w net.core.wmem_max=134217728 # 128 MB
sysctl -w net.core.rmem_default=65536 # 64 KB
sysctl -w net.core.wmem_default=65536 # 64 KB
# TCP-specific buffer tuning
sysctl -w net.ipv4.tcp_rmem="4096 87380 67108864" # min default max
sysctl -w net.ipv4.tcp_wmem="4096 65536 67108864" # min default max
Impact: Significantly increases network throughput, especially for high-bandwidth connections.
Before/After Example:
- Before: 500 Mbps throughput, 40% CPU usage
- After: 950 Mbps throughput, 25% CPU usage
Connection Queue Sizes
# Increase connection backlog
sysctl -w net.core.somaxconn=65535
sysctl -w net.core.netdev_max_backlog=65536
sysctl -w net.ipv4.tcp_max_syn_backlog=8192
Impact: Handles more concurrent connections without dropping requests.
TCP Connection Management
# Enable TCP window scaling for high-bandwidth networks
sysctl -w net.ipv4.tcp_window_scaling=1
# Enable TCP timestamps
sysctl -w net.ipv4.tcp_timestamps=1
# Enable selective acknowledgments
sysctl -w net.ipv4.tcp_sack=1
# Reduce TIME_WAIT connections
sysctl -w net.ipv4.tcp_fin_timeout=15
sysctl -w net.ipv4.tcp_tw_reuse=1
Impact: Improves connection handling efficiency and reduces resource consumption.
TCP Congestion Control
# View available congestion control algorithms
sysctl net.ipv4.tcp_available_congestion_control
# Set to BBR for better performance (Linux 4.9+)
sysctl -w net.core.default_qdisc=fq
sysctl -w net.ipv4.tcp_congestion_control=bbr
# Alternative: cubic (default, good for most cases)
# sysctl -w net.ipv4.tcp_congestion_control=cubic
Impact: BBR can increase throughput by 2-25x in certain network conditions.
File System Parameters (fs.*)
# Increase file descriptor limits
sysctl -w fs.file-max=2097152
# Increase inode cache
sysctl -w fs.inode-max=2097152
# Optimize for many small files
sysctl -w fs.aio-max-nr=1048576
Impact: Prevents "too many open files" errors and improves performance for applications handling many concurrent connections.
Kernel Core Parameters (kernel.*)
# Increase process identifier range
sysctl -w kernel.pid_max=4194304
# Reduce kernel message verbosity (optional)
sysctl -w kernel.printk="3 4 1 3"
# Enable panic on out-of-memory (optional, for production stability)
sysctl -w vm.panic_on_oom=1
sysctl -w kernel.panic=10 # Reboot 10 seconds after panic
Complete Optimized Configuration
Here's a comprehensive sysctl configuration optimized for high-performance servers:
Creating /etc/sysctl.d/99-performance.conf
# Create performance-optimized configuration
cat > /etc/sysctl.d/99-performance.conf << 'EOF'
# Virtual Memory Optimization
vm.swappiness = 10
vm.dirty_ratio = 15
vm.dirty_background_ratio = 5
vm.vfs_cache_pressure = 50
vm.min_free_kbytes = 131072
vm.overcommit_memory = 1
# Network Core Settings
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.core.rmem_default = 65536
net.core.wmem_default = 65536
net.core.netdev_max_backlog = 65536
net.core.somaxconn = 65535
net.core.optmem_max = 25165824
# TCP Settings
net.ipv4.tcp_rmem = 4096 87380 67108864
net.ipv4.tcp_wmem = 4096 65536 67108864
net.ipv4.tcp_max_syn_backlog = 8192
net.ipv4.tcp_slow_start_after_idle = 0
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_keepalive_probes = 5
net.ipv4.tcp_keepalive_intvl = 15
net.ipv4.tcp_max_tw_buckets = 1440000
net.ipv4.tcp_fastopen = 3
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_sack = 1
net.ipv4.tcp_congestion_control = bbr
net.core.default_qdisc = fq
# IPv4 Network Security and Performance
net.ipv4.ip_local_port_range = 10000 65535
net.ipv4.tcp_mtu_probing = 1
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.all.accept_source_route = 0
# File System Settings
fs.file-max = 2097152
fs.inode-max = 2097152
fs.aio-max-nr = 1048576
# Kernel Settings
kernel.pid_max = 4194304
kernel.sched_migration_cost_ns = 5000000
kernel.sched_autogroup_enabled = 0
EOF
Applying the Configuration
# Apply immediately
sysctl -p /etc/sysctl.d/99-performance.conf
# Verify changes
sysctl -a | grep -E '(swappiness|tcp_congestion|somaxconn|file-max)'
# Check for errors
dmesg | tail -20
Workload-Specific Optimizations
Web Server Optimization
For Nginx/Apache high-traffic scenarios:
cat > /etc/sysctl.d/99-webserver.conf << 'EOF'
# Web server specific optimizations
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 20480
net.ipv4.tcp_fin_timeout = 10
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_timestamps = 1
net.ipv4.ip_local_port_range = 10000 65535
fs.file-max = 2097152
# Connection tracking (if using iptables)
net.netfilter.nf_conntrack_max = 1048576
net.netfilter.nf_conntrack_tcp_timeout_established = 600
EOF
Performance Improvement:
- Before: 500 req/s, 200ms latency
- After: 2,500 req/s, 40ms latency (5x improvement)
Database Server Optimization
For MySQL/PostgreSQL:
cat > /etc/sysctl.d/99-database.conf << 'EOF'
# Database server optimizations
vm.swappiness = 1
vm.dirty_ratio = 10
vm.dirty_background_ratio = 3
vm.overcommit_memory = 2
vm.overcommit_ratio = 95
# Increase shared memory limits for databases
kernel.shmmax = 68719476736 # 64 GB
kernel.shmall = 4294967296
kernel.shmmni = 4096
# Semaphore settings for database
kernel.sem = 250 32000 100 128
# Large pages for database memory allocation
vm.nr_hugepages = 1024
EOF
Performance Improvement:
- Before: 250 queries/s, 40ms latency
- After: 850 queries/s, 12ms latency (3.4x improvement)
Low-Latency Applications
For trading platforms, gaming servers, real-time processing:
cat > /etc/sysctl.d/99-lowlatency.conf << 'EOF'
# Ultra-low latency optimizations
net.ipv4.tcp_low_latency = 1
net.ipv4.tcp_slow_start_after_idle = 0
net.core.busy_poll = 50
net.core.busy_read = 50
net.ipv4.tcp_fastopen = 3
# Disable transparent huge pages (can cause latency spikes)
# This requires: echo never > /sys/kernel/mm/transparent_hugepage/enabled
# CPU scheduler tuning
kernel.sched_min_granularity_ns = 10000000
kernel.sched_wakeup_granularity_ns = 15000000
kernel.sched_migration_cost_ns = 5000000
EOF
Performance Improvement:
- Before: P99 latency 15ms
- After: P99 latency 3ms (5x improvement)
Testing and Validation
Verifying Parameter Changes
# Verify specific parameters
sysctl vm.swappiness
sysctl net.ipv4.tcp_congestion_control
sysctl fs.file-max
# Check all custom parameters
sysctl -p /etc/sysctl.d/99-performance.conf
# View differences from defaults
sysctl -a > /tmp/current_sysctl.txt
# Compare with default values
Performance Testing After Optimization
Network Performance
# Test TCP throughput with iperf3
# On server:
iperf3 -s
# On client:
iperf3 -c server_ip -t 30 -P 4
# Before optimization: 500-700 Mbps
# After optimization: 900-950 Mbps
Web Server Load Test
# Apache Bench test
ab -n 100000 -c 1000 -k http://localhost/
# Before optimization:
# Requests per second: 500
# Time per request: 2000ms (mean, across concurrent requests)
# Failed requests: 150
# After optimization:
# Requests per second: 2500
# Time per request: 400ms (mean, across concurrent requests)
# Failed requests: 0
Database Performance
# MySQL benchmark
sysbench --test=oltp --mysql-user=root --mysql-password=password \
--oltp-table-size=1000000 --max-requests=100000 --num-threads=100 run
# Before optimization:
# Transactions: 25000 (250 per second)
# Queries: 500000 (5000 per second)
# Response time: 40ms (95 percentile)
# After optimization:
# Transactions: 85000 (850 per second)
# Queries: 1700000 (17000 per second)
# Response time: 12ms (95 percentile)
System Stability Testing
# Monitor system under load for 24 hours
vmstat 10 > /var/log/vmstat.log &
iostat -x 10 > /var/log/iostat.log &
sar -n DEV 10 > /var/log/sar.log &
# Run stress test
stress-ng --cpu 4 --vm 2 --vm-bytes 1G --io 2 --timeout 24h
# Check for OOM events
dmesg | grep -i "out of memory"
grep -i "oom" /var/log/syslog
# Verify no system crashes or hangs
uptime
Monitoring and Fine-Tuning
Real-Time Monitoring
# Watch network statistics
watch -n 1 'cat /proc/net/sockstat'
# Monitor TCP connection states
watch -n 1 'ss -s'
# Track memory usage
watch -n 1 'free -h'
# View system call statistics
pidstat -t 1
# Network interface statistics
nload -m
iftop -i eth0
Key Metrics to Monitor
-
Network Metrics:
- Connection queue drops:
netstat -s | grep -i drop - TCP retransmissions:
netstat -s | grep -i retrans - Socket buffer errors:
netstat -s | grep -i buffer
- Connection queue drops:
-
Memory Metrics:
- Swap usage:
free -h | grep Swap - Page faults:
vmstat 1 | awk '{print $7, $8}' - Cache efficiency:
cat /proc/vmstat | grep pgpg
- Swap usage:
-
File System Metrics:
- Open file descriptors:
cat /proc/sys/fs/file-nr - Inode usage:
df -i
- Open file descriptors:
Automated Monitoring Script
#!/bin/bash
# Save as /usr/local/bin/monitor-performance.sh
LOG_DIR="/var/log/performance"
mkdir -p $LOG_DIR
DATE=$(date +%Y%m%d-%H%M%S)
# Network statistics
echo "=== Network Stats ===" > $LOG_DIR/network-$DATE.log
ss -s >> $LOG_DIR/network-$DATE.log
netstat -s >> $LOG_DIR/network-$DATE.log
# Memory statistics
echo "=== Memory Stats ===" > $LOG_DIR/memory-$DATE.log
free -h >> $LOG_DIR/memory-$DATE.log
cat /proc/meminfo >> $LOG_DIR/memory-$DATE.log
# File system statistics
echo "=== File System Stats ===" > $LOG_DIR/filesystem-$DATE.log
cat /proc/sys/fs/file-nr >> $LOG_DIR/filesystem-$DATE.log
df -h >> $LOG_DIR/filesystem-$DATE.log
# Check for anomalies
if [ $(cat /proc/sys/fs/file-nr | awk '{print $1}') -gt 100000 ]; then
echo "WARNING: High number of open file descriptors" | mail -s "Performance Alert" [email protected]
fi
# Make executable and schedule
chmod +x /usr/local/bin/monitor-performance.sh
crontab -e
# Add: */5 * * * * /usr/local/bin/monitor-performance.sh
Common Issues and Troubleshooting
Issue 1: Parameters Not Persisting After Reboot
Problem: Changes made with sysctl -w disappear after reboot.
Solution:
# Ensure changes are in configuration files
ls -la /etc/sysctl.conf /etc/sysctl.d/
# Verify syntax
sysctl -p /etc/sysctl.d/99-performance.conf
# Check for conflicting files
grep -r "vm.swappiness" /etc/sysctl.d/
Issue 2: Invalid Parameter Errors
Problem: error: "net.ipv4.tcp_congestion_control" is an unknown key
Solution:
# Check if kernel module is loaded
lsmod | grep tcp_bbr
# Load BBR module
modprobe tcp_bbr
echo "tcp_bbr" >> /etc/modules-load.d/bbr.conf
# Verify available algorithms
sysctl net.ipv4.tcp_available_congestion_control
Issue 3: System Instability After Changes
Problem: System becomes unstable or slow after optimization.
Solution:
# Reset to defaults temporarily
sysctl -p /usr/lib/sysctl.d/50-default.conf
# Test parameters one by one
sysctl -w vm.swappiness=10
# Monitor for 10 minutes
# If stable, make permanent
# Revert problematic parameter
sysctl -w vm.swappiness=60
Issue 4: Network Connection Drops
Problem: Increased connection drops after network tuning.
Solution:
# Check for buffer overruns
netstat -s | grep -i overrun
# Reduce buffer sizes if necessary
sysctl -w net.core.rmem_max=67108864
sysctl -w net.core.wmem_max=67108864
# Monitor drops
watch -n 1 'netstat -s | grep -i drop'
Security Considerations
Safe Parameters
These parameters improve performance without security impact:
vm.swappinessvm.dirty_ratio- TCP buffer sizes
fs.file-max
Parameters Requiring Caution
# Disable ICMP redirects (security vs performance trade-off)
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.send_redirects = 0
# Enable IP forwarding only if needed (routers/NAT)
# net.ipv4.ip_forward = 1 # Disable if not routing
# SYN cookies (security vs performance)
net.ipv4.tcp_syncookies = 1 # Keep enabled for DDoS protection
Security-Focused Performance Config
cat > /etc/sysctl.d/99-secure-performance.conf << 'EOF'
# Performance with security maintained
# Memory management
vm.swappiness = 10
vm.dirty_ratio = 15
# Network performance
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_rmem = 4096 87380 67108864
net.ipv4.tcp_wmem = 4096 65536 67108864
# Security maintained
net.ipv4.tcp_syncookies = 1
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.all.log_martians = 1
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.icmp_ignore_bogus_error_responses = 1
# Connection tracking limits
net.netfilter.nf_conntrack_max = 1048576
net.netfilter.nf_conntrack_tcp_timeout_established = 600
EOF
Advanced Tuning Techniques
CPU Affinity and NUMA Awareness
# Check NUMA configuration
numactl --hardware
# Set NUMA policy for performance
sysctl -w kernel.numa_balancing=0
# CPU scheduler tuning
sysctl -w kernel.sched_min_granularity_ns=10000000
sysctl -w kernel.sched_wakeup_granularity_ns=15000000
Huge Pages Configuration
# Calculate required huge pages (for databases)
# If MySQL needs 8GB: 8192 MB / 2 MB = 4096 pages
# Set huge pages
sysctl -w vm.nr_hugepages=4096
# Make persistent
echo "vm.nr_hugepages = 4096" >> /etc/sysctl.d/99-hugepages.conf
# Verify
cat /proc/meminfo | grep -i huge
Network Interface Ring Buffer Tuning
# Check current ring buffer size
ethtool -g eth0
# Increase to maximum
ethtool -G eth0 rx 4096 tx 4096
# Make persistent
cat > /etc/systemd/system/ethtool.service << 'EOF'
[Unit]
Description=Ethtool Configuration
After=network.target
[Service]
Type=oneshot
ExecStart=/usr/sbin/ethtool -G eth0 rx 4096 tx 4096
[Install]
WantedBy=multi-user.target
EOF
systemctl enable ethtool.service
Conclusion
Kernel parameter optimization through sysctl is one of the most effective ways to improve Linux server performance. The optimizations covered in this guide can deliver substantial improvements across various workloads:
Typical Performance Gains:
- Web servers: 3-5x improvement in requests per second
- Database servers: 2-4x improvement in query throughput
- Network throughput: 50-90% improvement in bandwidth utilization
- Application latency: 60-80% reduction in response times
Key Takeaways:
- Always benchmark before and after changes to measure actual impact
- Make changes incrementally and monitor system behavior
- Document all modifications for troubleshooting and rollback
- Tailor configurations to your specific workload characteristics
- Monitor continuously to detect issues early
- Maintain security while optimizing for performance
Best Practices:
- Start with conservative values and increase gradually
- Test in staging environment before production deployment
- Use workload-specific configurations rather than generic optimizations
- Keep configurations in version control
- Set up automated monitoring and alerting
- Review and update configurations as workloads evolve
Remember that kernel tuning is not a one-time task but an ongoing process. As your application grows and workload patterns change, you'll need to revisit and adjust these parameters. The monitoring and testing frameworks described in this guide will help you maintain optimal performance over time.
By implementing these optimizations methodically and monitoring their impact, you can significantly improve your server's performance, reduce infrastructure costs, and provide better service to your users. Start with the general optimizations provided in the complete configuration, then fine-tune based on your specific workload characteristics and monitoring data.


