High Memory Usage: Diagnostics and Analysis

Introduction

Memory management is critical for server stability and performance. When memory resources are exhausted, Linux systems resort to swapping, which dramatically degrades performance, or trigger the Out-Of-Memory (OOM) killer, which terminates processes to free resources. Understanding how to diagnose and resolve memory issues is essential for maintaining reliable server operations.

This comprehensive guide provides system administrators with practical command-line tools and methodologies for diagnosing high memory usage. You'll learn how to identify memory-hungry processes, analyze memory leaks, understand swap usage, and implement effective solutions to memory-related problems.

Memory issues often manifest subtly before causing critical failures. This guide will teach you to recognize early warning signs, perform detailed memory analysis, and implement preventive measures to maintain optimal memory utilization across your Linux infrastructure.

Understanding Linux Memory Management

Memory Types and Components

Linux memory consists of several components:

Physical Memory (RAM): Actual hardware memory Virtual Memory: RAM + Swap space Swap: Disk space used as extended memory Cache: File system cache for performance Buffers: Temporary storage for I/O operations Shared Memory: Memory shared between processes Anonymous Memory: Process heap and stack

Memory Metrics Explained

Total: Total physical RAM installed Used: Memory currently in use Free: Completely unused memory Available: Memory available for new applications (includes reclaimable cache) Buffers: Memory used for block device I/O buffers Cache: Memory used for file system cache Swap: Disk-based virtual memory Committed: Memory allocated to processes

OOM Killer

When the system runs out of memory, the OOM (Out-Of-Memory) killer terminates processes based on a scoring system:

  • Processes using more memory score higher
  • Root processes score lower
  • Processes running longer score lower
  • Process with lower nice values score lower

Initial Memory Assessment

Quick Memory Status Check

Start with these rapid assessment commands:

# Basic memory overview
free -h

# Detailed memory information
cat /proc/meminfo

# Memory usage summary
vmstat -s

# Per-process memory usage
ps aux --sort=-%mem | head -10

# System memory state
cat /proc/sys/vm/swappiness
cat /proc/sys/vm/vfs_cache_pressure

# Check for OOM events
dmesg | grep -i "out of memory"
grep -i "killed process" /var/log/kern.log

Quick interpretation:

# If available < 10% of total
# AND swap used > 50%
# THEN investigate immediately

# If OOM killer active
# THEN critical memory shortage

# If cache > 50% and available low
# THEN cache can be reclaimed (usually OK)

Understanding free Command Output

free -h

              total        used        free      shared  buff/cache   available
Mem:           15Gi       8.0Gi       1.0Gi       256Mi       6.0Gi       6.5Gi
Swap:         4.0Gi       2.0Gi       2.0Gi

Interpretation:

  • Total: 16GB physical RAM
  • Used: 8GB actively used by processes
  • Free: 1GB completely unused
  • Buff/cache: 6GB used for caching (reclaimable)
  • Available: 6.5GB available for new applications
  • Swap used: 2GB (concerning if high)

Key insight: Linux uses "free" memory for cache to improve performance. Focus on "available" not "free".

Step 1: Process-Level Memory Analysis

Using ps for Memory Analysis

# Top memory consumers
ps aux --sort=-%mem | head -15

# Memory usage by process name
ps aux | awk '{print $11, $6}' | sort -k2 -nr | head -15

# Total memory by process name
ps aux | awk '{arr[$11]+=$6} END {for (i in arr) print i, arr[i]}' | sort -k2 -nr

# Specific user's memory usage
ps aux | grep ^username | awk '{sum+=$6} END {print "Total:", sum/1024 "MB"}'

# VSZ vs RSS comparison
ps aux --sort=-rss | awk '{printf "%-10s %8s %8s %s\n", $2, $5, $6, $11}' | head -15

# Memory usage with command line
ps -eo pid,user,pmem,rss,vsz,cmd --sort=-pmem | head -20

# Child processes memory
ps --ppid 1234 -o pid,rss,cmd

Key metrics:

  • VSZ: Virtual memory size (total addressable)
  • RSS: Resident Set Size (physical RAM used)
  • %MEM: Percentage of physical RAM
  • SHR: Shared memory

Understanding Memory Columns

# Detailed process memory info
ps -eo pid,user,rss,vsz,pmem,comm --sort=-rss | head -15

# VSZ in MB
ps -eo pid,user,vsz,comm --sort=-vsz | awk '{$3=int($3/1024)"M"} {print}' | head -15

# RSS in MB
ps -eo pid,user,rss,comm --sort=-rss | awk '{$3=int($3/1024)"M"} {print}' | head -15

Per-Process Memory Details

# Process memory map
pmap -x 1234

# Extended memory map
pmap -XX 1234

# Memory map summary
pmap -d 1234

# Detailed memory usage
cat /proc/1234/status | grep -i mem
cat /proc/1234/status | grep -i vm

# SMAPS for detailed mapping
cat /proc/1234/smaps

# Memory statistics
cat /proc/1234/statm

Step 2: System-Wide Memory Analysis

Using free Command

# Human-readable output
free -h

# Output in MB
free -m

# Output in GB
free -g

# Continuous monitoring (every 2 seconds)
free -h -s 2

# Wide output
free -h -w

# Total line only
free -h -t

# Show count iterations
free -h -s 2 -c 10

Analyzing /proc/meminfo

Detailed system memory information:

# View all memory info
cat /proc/meminfo

# Key metrics
grep -E "MemTotal|MemFree|MemAvailable|Buffers|Cached|SwapTotal|SwapFree" /proc/meminfo

# Calculate actual free memory
awk '/MemAvailable/ {print "Available:", $2/1024/1024 "GB"}' /proc/meminfo

# Cache size
awk '/^Cached/ {print "Cache:", $2/1024/1024 "GB"}' /proc/meminfo

# Slab memory (kernel objects)
awk '/^Slab/ {print "Slab:", $2/1024/1024 "GB"}' /proc/meminfo

# Check for memory leaks in slab
cat /proc/slabinfo | head -20
slabtop -o

Using vmstat

Virtual memory statistics:

# Basic vmstat
vmstat 1 10

# Memory-specific stats
vmstat -s

# Active/inactive memory
vmstat -a 2 10

# Disk and memory stats
vmstat -d 2

# Wide output
vmstat -w 1 5

# Key columns:
# swpd = swap used
# free = free memory
# buff = buffers
# cache = cache
# si = swap in (from disk)
# so = swap out (to disk)

Critical indicators:

# High swap in/out (si/so > 0 consistently)
# THEN system is thrashing

# Free memory near zero
# AND swap increasing
# THEN memory exhaustion

# bi/bo (block in/out) high with si/so
# THEN swapping actively occurring

Memory Over Time with sar

# Install sysstat if needed
apt install sysstat
systemctl enable sysstat

# Current memory stats
sar -r

# Memory usage with swap
sar -r -S

# Historical data (last 24 hours)
sar -r -f /var/log/sysstat/sa$(date +%d)

# Specific time range
sar -r -s 10:00:00 -e 11:00:00

# Real-time monitoring
sar -r 2

# Paging statistics
sar -B 2

Step 3: Swap Analysis

Checking Swap Usage

# Swap summary
swapon --show

# Detailed swap info
cat /proc/swaps

# Swap usage by process
for file in /proc/*/status ; do
    awk '/VmSwap|Name/{printf $2 " " $3}END{ print ""}' $file
done | sort -k 2 -n -r | head -10

# Swap usage script
cat > /tmp/swap-usage.sh << 'EOF'
#!/bin/bash
SUM=0
OVERALL=0
for DIR in $(find /proc/ -maxdepth 1 -type d -regex "^/proc/[0-9]+")
do
    PID=$(echo $DIR | cut -d / -f 3)
    PROGNAME=$(ps -p $PID -o comm --no-headers)
    for SWAP in $(grep VmSwap $DIR/status 2>/dev/null | awk '{print $2}')
    do
        ((SUM=$SUM+$SWAP))
    done
    if [ $SUM -gt 0 ]; then
        echo "$PID $PROGNAME $((SUM/1024))MB"
    fi
    ((OVERALL=$OVERALL+$SUM))
    SUM=0
done
echo "Total: $((OVERALL/1024))MB"
EOF

chmod +x /tmp/swap-usage.sh
/tmp/swap-usage.sh | sort -k3 -n -r | head -15

Swap Performance Analysis

# Swap activity
vmstat 1 5 | awk '{print $7, $8}'

# Swap I/O stats
iostat -x 1 5 | grep -A1 swap

# Monitor swap in/out
sar -W 1 10

# Check swappiness setting
cat /proc/sys/vm/swappiness

# Optimal swappiness values:
# 0-10 = Minimal swapping (servers)
# 60 = Default
# 100 = Aggressive swapping

Swap Management

# Check swap partitions/files
cat /etc/fstab | grep swap

# Disable swap temporarily
swapoff -a

# Enable swap
swapon -a

# Clear swap (requires sufficient RAM)
swapoff -a && swapon -a

# Add temporary swap file
dd if=/dev/zero of=/swapfile bs=1M count=2048
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile

# Verify
swapon --show

# Remove swap file
swapoff /swapfile
rm /swapfile

Step 4: Memory Leak Detection

Identifying Memory Leaks

Memory leaks occur when processes continuously consume memory without releasing it:

# Monitor process memory over time
while true; do
    ps -p 1234 -o pid,rss,vsz,cmd
    sleep 60
done

# Automated leak detection script
cat > /tmp/memory-leak-detector.sh << 'EOF'
#!/bin/bash
PID=$1
LOG="/tmp/memleak-$PID.log"

echo "Monitoring PID $PID for memory leaks..."
echo "Time,RSS_KB,VSZ_KB" > $LOG

while kill -0 $PID 2>/dev/null; do
    TIMESTAMP=$(date +%H:%M:%S)
    RSS=$(ps -p $PID -o rss= | tr -d ' ')
    VSZ=$(ps -p $PID -o vsz= | tr -d ' ')
    echo "$TIMESTAMP,$RSS,$VSZ" >> $LOG
    sleep 300  # Check every 5 minutes
done

echo "Process ended. Check $LOG for trends"
EOF

chmod +x /tmp/memory-leak-detector.sh
/tmp/memory-leak-detector.sh 1234

Analyzing Memory Leaks

# Check memory growth trend
cat /tmp/memleak-1234.log | awk -F',' '{print $2}' |
while read value; do
    echo $value
done

# Use valgrind for leak detection
valgrind --leak-check=full --show-leak-kinds=all command

# Valgrind with log
valgrind --leak-check=full --log-file=valgrind.log command

# Check for memory fragmentation
cat /proc/buddyinfo

# Slab allocator leaks
slabtop -o -s c
watch -n 1 "cat /proc/slabinfo | grep dentry"

Application-Specific Leak Detection

# PHP memory leak
# Add to php.ini
memory_limit = 256M
display_errors = On
log_errors = On

# Monitor PHP process
watch -n 5 'ps aux | grep php-fpm | grep -v grep'

# Java heap dump
jmap -dump:live,format=b,file=heap.bin PID
jmap -heap PID

# Python memory profiling
python -m memory_profiler script.py

# Node.js heap snapshot
node --inspect script.js
# Use Chrome DevTools for heap analysis

Step 5: Cache and Buffer Analysis

Understanding Cache

Linux aggressively caches to improve performance:

# Cache size
free -h | grep "Mem:" | awk '{print $6}'

# Clear page cache (safe)
sync
echo 1 > /proc/sys/vm/drop_caches

# Clear dentries and inodes
echo 2 > /proc/sys/vm/drop_caches

# Clear all caches
echo 3 > /proc/sys/vm/drop_caches

# Note: Cache clears automatically when memory needed
# Manual clearing rarely necessary

# Cache pressure setting
cat /proc/sys/vm/vfs_cache_pressure
# Default: 100
# Lower: Prefer to keep cache
# Higher: Reclaim cache more aggressively

Buffer Analysis

# Buffer size
awk '/^Buffers:/ {print "Buffers:", $2/1024 "MB"}' /proc/meminfo

# What's using buffers
cat /proc/sys/vm/block_dump
echo 1 > /proc/sys/vm/block_dump
dmesg | tail -100
echo 0 > /proc/sys/vm/block_dump  # Disable

# I/O buffer stats
vmstat -a 1 5

Step 6: OOM Analysis

Detecting OOM Events

# Check for recent OOM kills
dmesg | grep -i "killed process"
dmesg | grep -i "out of memory"

# Kernel log
grep -i "out of memory" /var/log/kern.log
grep -i "oom" /var/log/messages

# Journal logs
journalctl -k | grep -i "out of memory"
journalctl --since "1 hour ago" | grep -i oom

# Get OOM scores for running processes
for proc in /proc/*/oom_score; do
    printf "%d %s\n" "$(cat $proc)" "$(cat ${proc%/*}/cmdline | tr '\0' ' ')"
done | sort -nr | head -15

OOM Score Analysis

# Check OOM score for process
cat /proc/1234/oom_score
cat /proc/1234/oom_score_adj

# Set OOM adjustment (-1000 to 1000)
echo -500 > /proc/1234/oom_score_adj

# Protect critical process from OOM killer
echo -1000 > /proc/1234/oom_score_adj

# Make process more likely to be killed
echo 1000 > /proc/1234/oom_score_adj

# Set permanently via systemd
cat > /etc/systemd/system/critical-service.service.d/oom.conf << 'EOF'
[Service]
OOMScoreAdjust=-1000
EOF

systemctl daemon-reload
systemctl restart critical-service

Preventing OOM

# Configure vm.panic_on_oom
cat /proc/sys/vm/panic_on_oom
# 0 = OOM killer kills process
# 1 = Kernel panics on OOM
# 2 = Always panic

# Enable OOM kill for cgroup
cat /proc/sys/vm/oom_kill_allocating_task

# Memory overcommit settings
cat /proc/sys/vm/overcommit_memory
# 0 = Heuristic (default)
# 1 = Always overcommit
# 2 = Never overcommit beyond limit

# Set overcommit ratio
cat /proc/sys/vm/overcommit_ratio
echo 80 > /proc/sys/vm/overcommit_ratio

Solutions and Remediation

Immediate Actions

Identify and kill memory-hungry process:

# Find biggest memory consumer
ps aux --sort=-%mem | head -2 | tail -1 | awk '{print $2}'

# Kill it
kill $(ps aux --sort=-%mem | head -2 | tail -1 | awk '{print $2}')

# Force kill if needed
kill -9 $(ps aux --sort=-%mem | head -2 | tail -1 | awk '{print $2}')

Clear cache to free memory:

# Sync and clear
sync && echo 3 > /proc/sys/vm/drop_caches

# Check freed memory
free -h

Restart problematic service:

# Restart service
systemctl restart service-name

# Check memory after restart
ps aux | grep service-name

Adding More Swap

# Create 4GB swap file
dd if=/dev/zero of=/swapfile bs=1G count=4
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile

# Make permanent
echo '/swapfile none swap sw 0 0' >> /etc/fstab

# Verify
swapon --show
free -h

Optimizing Swappiness

# Check current value
cat /proc/sys/vm/swappiness

# Set swappiness (0-100)
# Lower = less swapping
sysctl vm.swappiness=10

# Make permanent
echo "vm.swappiness=10" >> /etc/sysctl.conf

# Apply immediately
sysctl -p

# Recommended values:
# Desktop: 60 (default)
# Server: 10
# Database server: 1

Application Memory Limits

Systemd service limits:

cat > /etc/systemd/system/service-name.service.d/memory.conf << 'EOF'
[Service]
MemoryLimit=2G
MemoryMax=2G
EOF

systemctl daemon-reload
systemctl restart service-name

Cgroup memory limits:

# Create cgroup
cgcreate -g memory:/myapp

# Set limit (2GB)
echo 2147483648 > /sys/fs/cgroup/memory/myapp/memory.limit_in_bytes

# Run process in cgroup
cgexec -g memory:myapp command

PHP memory limit:

# Edit php.ini
memory_limit = 256M

# Per-script override
php -d memory_limit=512M script.php

MySQL memory optimization:

# Edit /etc/mysql/my.cnf
[mysqld]
innodb_buffer_pool_size = 4G
key_buffer_size = 256M
max_connections = 100
table_open_cache = 2000

Kernel Parameter Tuning

# Edit /etc/sysctl.conf

# Swappiness
vm.swappiness = 10

# Cache pressure
vm.vfs_cache_pressure = 50

# Overcommit settings
vm.overcommit_memory = 0
vm.overcommit_ratio = 80

# Dirty ratio (% of RAM for dirty pages)
vm.dirty_ratio = 15
vm.dirty_background_ratio = 5

# Minimum free memory (KB)
vm.min_free_kbytes = 65536

# Apply changes
sysctl -p

Prevention and Monitoring

Continuous Memory Monitoring

cat > /usr/local/bin/memory-monitor.sh << 'EOF'
#!/bin/bash

THRESHOLD=90
LOG_FILE="/var/log/memory-monitor.log"
ALERT_EMAIL="[email protected]"

while true; do
    TOTAL=$(free | grep Mem | awk '{print $2}')
    USED=$(free | grep Mem | awk '{print $3}')
    PERCENT=$((USED * 100 / TOTAL))

    if [ $PERCENT -gt $THRESHOLD ]; then
        echo "$(date): High memory usage: $PERCENT%" >> "$LOG_FILE"
        echo "Top processes:" >> "$LOG_FILE"
        ps aux --sort=-%mem | head -10 >> "$LOG_FILE"

        # Send alert
        echo "High memory alert on $(hostname): $PERCENT%" | \
            mail -s "Memory Alert: $PERCENT%" "$ALERT_EMAIL"
    fi

    sleep 300  # Check every 5 minutes
done
EOF

chmod +x /usr/local/bin/memory-monitor.sh

# Run as systemd service
cat > /etc/systemd/system/memory-monitor.service << 'EOF'
[Unit]
Description=Memory Monitoring Service
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/memory-monitor.sh
Restart=always

[Install]
WantedBy=multi-user.target
EOF

systemctl enable memory-monitor.service
systemctl start memory-monitor.service

Memory Usage Reports

cat > /usr/local/bin/memory-report.sh << 'EOF'
#!/bin/bash

REPORT="/tmp/memory-report-$(date +%Y%m%d).txt"

echo "Memory Usage Report - $(date)" > "$REPORT"
echo "================================" >> "$REPORT"
echo "" >> "$REPORT"

echo "System Memory:" >> "$REPORT"
free -h >> "$REPORT"
echo "" >> "$REPORT"

echo "Swap Usage:" >> "$REPORT"
swapon --show >> "$REPORT"
echo "" >> "$REPORT"

echo "Top 15 Memory Consumers:" >> "$REPORT"
ps aux --sort=-%mem | head -16 >> "$REPORT"
echo "" >> "$REPORT"

echo "Memory by User:" >> "$REPORT"
ps aux | awk '{arr[$1]+=$6} END {for (i in arr) print i, int(arr[i]/1024) "MB"}' | sort -k2 -nr >> "$REPORT"
echo "" >> "$REPORT"

echo "Recent OOM Events:" >> "$REPORT"
dmesg | grep -i "killed process" | tail -5 >> "$REPORT"

mail -s "Daily Memory Report - $(hostname)" [email protected] < "$REPORT"
EOF

chmod +x /usr/local/bin/memory-report.sh

# Schedule daily at 8 AM
echo "0 8 * * * /usr/local/bin/memory-report.sh" | crontab -

Performance Baseline

cat > /usr/local/bin/memory-baseline.sh << 'EOF'
#!/bin/bash

BASELINE_DIR="/var/log/memory-baseline"
mkdir -p "$BASELINE_DIR"
DATE=$(date +%Y%m%d-%H%M%S)

free -h > "$BASELINE_DIR/free-$DATE.txt"
cat /proc/meminfo > "$BASELINE_DIR/meminfo-$DATE.txt"
ps aux --sort=-%mem | head -50 > "$BASELINE_DIR/processes-$DATE.txt"
vmstat -s > "$BASELINE_DIR/vmstat-$DATE.txt"

echo "Memory baseline captured: $DATE"
EOF

chmod +x /usr/local/bin/memory-baseline.sh

# Run weekly
echo "0 2 * * 0 /usr/local/bin/memory-baseline.sh" | crontab -

Advanced Diagnostics

Using smem

Advanced memory reporting tool:

# Install smem
apt install smem

# Memory usage by process
smem -t

# Memory usage sorted
smem -s pss -r

# Memory by user
smem -u

# System-wide summary
smem -w

# Memory map
smem -m

# Percentage view
smem -p

Memory Profiling

# Use memusage for C programs
memusage command

# Generate memory graph
memusage --png=output.png command

# Massif (Valgrind tool)
valgrind --tool=massif command
ms_print massif.out.*

Conclusion

Memory management is crucial for system stability and performance. Key takeaways:

  1. Monitor available, not free: Linux caches aggressively
  2. Watch swap usage: High swap indicates memory pressure
  3. Identify memory leaks early: Monitor trends over time
  4. Configure OOM scores: Protect critical processes
  5. Tune kernel parameters: Optimize for your workload
  6. Set process limits: Prevent runaway memory consumption
  7. Monitor continuously: Automated monitoring catches issues early

Regular monitoring, proper configuration, and understanding these diagnostic tools will help you maintain optimal memory utilization and prevent memory-related outages. Keep these commands readily available for rapid troubleshooting when memory issues occur.