Linux System Log Analysis (/var/log/)

Introduction

System logs are the black boxes of Linux servers, containing invaluable information about system events, security incidents, application errors, and operational activities. Understanding how to effectively analyze logs in the /var/log/ directory is a fundamental skill for system administrators, security professionals, and DevOps engineers.

Every significant event on a Linux system is recorded in log files, from authentication attempts and kernel messages to application-specific errors and system warnings. These logs provide the forensic trail needed to troubleshoot issues, investigate security incidents, monitor system health, and maintain compliance with regulatory requirements.

This comprehensive guide explores the /var/log/ directory structure, teaches you how to interpret various log files, demonstrates powerful log analysis techniques using command-line tools, and provides practical examples for common troubleshooting scenarios. Whether you're diagnosing a failed service, investigating a security breach, or simply monitoring system health, mastering log analysis is essential for effective Linux system administration.

Prerequisites

Before diving into log analysis, ensure you have:

  • A Linux server or workstation (Ubuntu 20.04/22.04, Debian 10/11, CentOS 7/8, Rocky Linux 8/9, or similar)
  • Root or sudo access to read protected log files
  • Basic understanding of Linux command-line interface
  • SSH access to your server (for remote log analysis)
  • Familiarity with basic text processing commands (grep, awk, sed)

Recommended Tools:

  • less or more for viewing logs
  • grep for searching log content
  • awk and sed for advanced log parsing
  • tail and head for viewing recent or first log entries
  • journalctl for systemd journal logs

Understanding /var/log/ Directory Structure

The /var/log/ directory is the standard location for system and application log files on Linux systems. Let's explore the key log files and their purposes.

Common Log Files Overview

# List all log files in /var/log/
ls -lh /var/log/

# View directory structure
tree /var/log/ -L 2

Essential System Log Files

1. /var/log/syslog (Debian/Ubuntu) or /var/log/messages (RHEL/CentOS)

General system activity log containing messages from the kernel, system daemons, and applications.

# View recent syslog entries (Ubuntu/Debian)
sudo tail -f /var/log/syslog

# View messages log (CentOS/Rocky Linux)
sudo tail -f /var/log/messages

2. /var/log/auth.log (Debian/Ubuntu) or /var/log/secure (RHEL/CentOS)

Authentication and authorization logs including SSH logins, sudo usage, and user authentication events.

# View authentication log (Ubuntu/Debian)
sudo tail -f /var/log/auth.log

# View secure log (CentOS/Rocky Linux)
sudo tail -f /var/log/secure

3. /var/log/kern.log

Kernel messages including hardware detection, driver loading, and kernel-level errors.

# View kernel log
sudo tail -f /var/log/kern.log

# Alternative: Use dmesg for kernel ring buffer
dmesg | tail -50

4. /var/log/dmesg

Boot-time kernel messages captured during system startup.

# View boot messages
sudo cat /var/log/dmesg

# Recent kernel ring buffer
dmesg -T | less

5. /var/log/boot.log

System boot and startup messages from init system and services.

# View boot log
sudo less /var/log/boot.log

Application-Specific Log Files

Web Server Logs:

# Apache logs
/var/log/apache2/access.log    # HTTP requests (Debian/Ubuntu)
/var/log/apache2/error.log     # Apache errors
/var/log/httpd/access_log      # HTTP requests (RHEL/CentOS)
/var/log/httpd/error_log       # Apache errors

# Nginx logs
/var/log/nginx/access.log      # HTTP requests
/var/log/nginx/error.log       # Nginx errors

Database Logs:

# MySQL/MariaDB
/var/log/mysql/error.log
/var/log/mysql/mysql.log
/var/log/mariadb/mariadb.log

# PostgreSQL
/var/log/postgresql/postgresql-*.log

Mail Server Logs:

# Postfix
/var/log/mail.log              # General mail log (Debian/Ubuntu)
/var/log/maillog               # Mail log (RHEL/CentOS)
/var/log/mail.err              # Mail errors

System Service Logs:

# Cron jobs
/var/log/cron                  # Scheduled task execution

# System daemon messages
/var/log/daemon.log

# User activity
/var/log/user.log

Log File Permissions and Security

Log files contain sensitive information and should have restricted permissions:

# Check log file permissions
ls -l /var/log/auth.log
# Typical output: -rw-r----- 1 syslog adm 45678 Jan 11 10:30 auth.log

# Verify proper ownership
sudo find /var/log -type f -exec ls -lh {} \; | head -20

# Check for world-readable sensitive logs (security issue)
sudo find /var/log -type f -perm -004

Basic Log Analysis Techniques

Viewing Log Files

Using tail for recent entries:

# View last 50 lines
sudo tail -50 /var/log/syslog

# Follow log in real-time
sudo tail -f /var/log/syslog

# Follow multiple logs simultaneously
sudo tail -f /var/log/syslog /var/log/auth.log

# Show last 100 lines from multiple files
sudo tail -n 100 /var/log/syslog /var/log/auth.log

Using head for oldest entries:

# View first 50 lines
sudo head -50 /var/log/syslog

# Combine with tail to view specific range
sudo head -1000 /var/log/syslog | tail -100

Using less for interactive viewing:

# Open log with less (searchable, scrollable)
sudo less /var/log/syslog

# Less shortcuts:
# / - search forward
# ? - search backward
# n - next match
# N - previous match
# G - go to end
# g - go to beginning
# F - follow mode (like tail -f)
# q - quit

Using cat for full content:

# Display entire log file
sudo cat /var/log/syslog

# Display with line numbers
sudo cat -n /var/log/syslog | less

# Display multiple files sequentially
sudo cat /var/log/syslog /var/log/auth.log

Searching Log Content

Basic grep searches:

# Search for specific term
sudo grep "error" /var/log/syslog

# Case-insensitive search
sudo grep -i "error" /var/log/syslog

# Search multiple files
sudo grep "failed" /var/log/auth.log /var/log/syslog

# Recursive search in directory
sudo grep -r "connection refused" /var/log/

# Show line numbers
sudo grep -n "error" /var/log/syslog

# Show context (3 lines before and after)
sudo grep -C 3 "error" /var/log/syslog

# Count occurrences
sudo grep -c "error" /var/log/syslog

# Invert match (show lines NOT matching)
sudo grep -v "info" /var/log/syslog

Advanced grep patterns:

# Search for failed SSH login attempts
sudo grep "Failed password" /var/log/auth.log

# Search for successful sudo commands
sudo grep "sudo.*COMMAND" /var/log/auth.log

# Search for specific IP address
sudo grep "192.168.1.100" /var/log/syslog

# Search using regular expressions
sudo grep -E "error|warning|critical" /var/log/syslog

# Search for lines starting with specific pattern
sudo grep "^Jan 11" /var/log/syslog

# Search with extended regex (egrep)
sudo egrep "fail(ed|ure)" /var/log/auth.log

Filtering by Date and Time

Extract specific time ranges:

# Find all entries from specific date
sudo grep "Jan 11" /var/log/syslog

# Find entries from specific hour
sudo grep "Jan 11 14:" /var/log/syslog

# Find entries within time range
sudo awk '/Jan 11 14:00/,/Jan 11 15:00/' /var/log/syslog

# Last hour's entries (using timestamp)
sudo awk -v date="$(date --date='1 hour ago' '+%b %d %H')" '$0 ~ date' /var/log/syslog

Using journalctl for systemd logs:

# Logs since specific time
sudo journalctl --since "2024-01-11 14:00:00"

# Logs until specific time
sudo journalctl --until "2024-01-11 15:00:00"

# Logs from last hour
sudo journalctl --since "1 hour ago"

# Logs from today
sudo journalctl --since today

# Logs from yesterday
sudo journalctl --since yesterday --until today

Counting and Statistics

Generate log statistics:

# Count total lines in log
wc -l /var/log/syslog

# Count error occurrences
sudo grep -c "error" /var/log/syslog

# Count unique IP addresses
sudo grep -oE '\b([0-9]{1,3}\.){3}[0-9]{1,3}\b' /var/log/nginx/access.log | sort -u | wc -l

# Top 10 most common errors
sudo grep "error" /var/log/syslog | sort | uniq -c | sort -rn | head -10

# Count logs per hour
sudo awk '{print $3}' /var/log/syslog | cut -d: -f1 | sort | uniq -c

Advanced Log Analysis with AWK

AWK is a powerful text processing tool ideal for structured log analysis.

Basic AWK Log Analysis

Print specific columns:

# Print timestamp and message (columns 1-3 and 5+)
sudo awk '{print $1, $2, $3, $5}' /var/log/syslog

# Print only error messages
sudo awk '/error/ {print $0}' /var/log/syslog

# Print messages from specific service
sudo awk '/nginx/ {print $0}' /var/log/syslog

Field-based filtering:

# Print logs from specific hour
sudo awk '$3 ~ /^14:/ {print $0}' /var/log/syslog

# Print logs with specific process
sudo awk '$5 == "sshd[1234]:" {print $0}' /var/log/auth.log

# Sum values in column (e.g., response sizes)
sudo awk '{sum+=$10} END {print sum}' /var/log/nginx/access.log

Advanced AWK Examples

Analyze Apache/Nginx access logs:

# Count requests by IP address
sudo awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -10

# Count requests by HTTP status code
sudo awk '{print $9}' /var/log/nginx/access.log | sort | uniq -c | sort -rn

# Calculate average response time
sudo awk '{sum+=$NF; count++} END {print sum/count}' /var/log/nginx/access.log

# Find 404 errors with URLs
sudo awk '$9 == 404 {print $7}' /var/log/nginx/access.log | sort | uniq -c | sort -rn

# Requests per minute
sudo awk '{print $4}' /var/log/nginx/access.log | cut -d: -f2,3 | sort | uniq -c

Parse authentication logs:

# Count failed login attempts by user
sudo awk '/Failed password/ {print $(NF-5)}' /var/log/auth.log | sort | uniq -c | sort -rn

# List successful sudo commands by user
sudo awk '/sudo.*COMMAND/ {for(i=1;i<=NF;i++) if($i=="USER=") print $(i+1)}' /var/log/auth.log | sort | uniq -c

# Track SSH connections by IP
sudo awk '/Accepted password/ {print $(NF-3)}' /var/log/auth.log | sort | uniq -c | sort -rn

System resource log analysis:

# Parse disk usage alerts
sudo awk '/disk.*full/ {print $0}' /var/log/syslog

# Track service restarts
sudo awk '/systemd.*Started/ {print $0}' /var/log/syslog

# Memory-related errors
sudo awk '/Out of memory|OOM/ {print $0}' /var/log/syslog

Log Analysis with SED

SED (Stream Editor) excels at log transformation and filtering.

Basic SED Operations

Filtering log entries:

# Delete blank lines
sudo sed '/^$/d' /var/log/syslog

# Print only lines containing "error"
sudo sed -n '/error/p' /var/log/syslog

# Delete lines containing "debug"
sudo sed '/debug/d' /var/log/syslog

# Print lines 100-200
sudo sed -n '100,200p' /var/log/syslog

Text transformation:

# Replace "error" with "ERROR"
sudo sed 's/error/ERROR/g' /var/log/syslog

# Remove timestamps (first 3 columns)
sudo sed 's/^[^ ]* [^ ]* [^ ]* //' /var/log/syslog

# Extract IP addresses
sudo sed -n 's/.*\([0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\).*/\1/p' /var/log/nginx/access.log

Advanced SED Analysis

Multi-line log processing:

# Join multi-line stack traces
sudo sed -e :a -e '/\\$/N; s/\\\n//; ta' /var/log/application.log

# Add separator between date changes
sudo sed '/^Jan 11/i\---' /var/log/syslog

Conditional processing:

# Add prefix to error lines
sudo sed '/error/s/^/[ERROR] /' /var/log/syslog

# Delete all lines after first error
sudo sed '/error/,$d' /var/log/syslog

# Keep only lines between two patterns
sudo sed -n '/START/,/END/p' /var/log/application.log

Practical Log Analysis Examples

Security Analysis

Detect brute force attacks:

#!/bin/bash
# detect-bruteforce.sh - Identify SSH brute force attempts

echo "=== Failed SSH Login Attempts by IP ==="
sudo grep "Failed password" /var/log/auth.log | \
    awk '{print $(NF-3)}' | \
    sort | uniq -c | sort -rn | \
    awk '$1 > 5 {print $1 " attempts from " $2}'

echo ""
echo "=== Failed Logins by Username ==="
sudo grep "Failed password" /var/log/auth.log | \
    awk '{print $(NF-5)}' | \
    sort | uniq -c | sort -rn | head -10

echo ""
echo "=== Recent Failed Attempts (Last 20) ==="
sudo grep "Failed password" /var/log/auth.log | tail -20

Audit sudo usage:

#!/bin/bash
# audit-sudo.sh - Track sudo command usage

echo "=== Sudo Commands by User ==="
sudo grep "sudo.*COMMAND" /var/log/auth.log | \
    awk '{for(i=1;i<=NF;i++) if($i=="USER=") print $(i+1)}' | \
    sort | uniq -c | sort -rn

echo ""
echo "=== Recent Sudo Commands ==="
sudo grep "sudo.*COMMAND" /var/log/auth.log | \
    tail -20 | \
    awk '{for(i=1;i<=NF;i++) if($i=="COMMAND=") {for(j=i+1;j<=NF;j++) printf $j" "; print ""}}'

Monitor privilege escalation attempts:

#!/bin/bash
# privilege-escalation.sh

echo "=== Failed Sudo Attempts ==="
sudo grep "sudo.*incorrect password" /var/log/auth.log

echo ""
echo "=== Su Command Usage ==="
sudo grep "su\[" /var/log/auth.log | tail -20

echo ""
echo "=== Authentication Failures ==="
sudo grep "authentication failure" /var/log/auth.log | \
    awk '{print $NF}' | sort | uniq -c | sort -rn

Application Error Analysis

Web server error analysis:

#!/bin/bash
# webserver-errors.sh - Analyze web server errors

LOG="/var/log/nginx/error.log"

echo "=== Error Distribution by Type ==="
sudo grep -oE 'error|warn|crit|alert|emerg' "$LOG" | \
    sort | uniq -c | sort -rn

echo ""
echo "=== Most Common Error Messages ==="
sudo grep "error" "$LOG" | \
    awk -F'] ' '{print $2}' | \
    sort | uniq -c | sort -rn | head -10

echo ""
echo "=== Errors by Hour ==="
sudo grep "error" "$LOG" | \
    awk '{print $1, $2}' | cut -d: -f1 | \
    sort | uniq -c

echo ""
echo "=== PHP Errors ==="
sudo grep -i "php" "$LOG" | tail -10

Database error analysis:

#!/bin/bash
# database-errors.sh - Analyze MySQL/MariaDB errors

LOG="/var/log/mysql/error.log"

echo "=== Database Error Summary ==="
sudo grep -i "error" "$LOG" | tail -20

echo ""
echo "=== Connection Issues ==="
sudo grep -i "connection\|connect" "$LOG" | tail -10

echo ""
echo "=== Crash/Restart Events ==="
sudo grep -i "shutdown\|started\|crash" "$LOG" | tail -10

echo ""
echo "=== Slow Query Warnings ==="
sudo grep -i "slow" "$LOG" | tail -10

System Performance Analysis

Disk space warnings:

#!/bin/bash
# disk-warnings.sh

echo "=== Disk Space Warnings ==="
sudo grep -i "no space left\|disk full\|quota exceeded" /var/log/syslog

echo ""
echo "=== Disk I/O Errors ==="
sudo grep -i "I/O error\|disk error" /var/log/kern.log | tail -10

Memory issues:

#!/bin/bash
# memory-issues.sh

echo "=== Out of Memory Events ==="
sudo grep -i "out of memory\|OOM\|killed process" /var/log/syslog | tail -20

echo ""
echo "=== Memory Allocation Failures ==="
sudo grep -i "allocation failed\|cannot allocate memory" /var/log/syslog | tail -10

Service failures:

#!/bin/bash
# service-failures.sh

echo "=== Failed Service Starts ==="
sudo grep -i "failed\|failure" /var/log/syslog | \
    grep -i "service\|systemd" | tail -20

echo ""
echo "=== Service Restarts ==="
sudo grep "systemd.*Started\|systemd.*Stopped" /var/log/syslog | tail -20

echo ""
echo "=== Crashed Services ==="
sudo grep -i "crash\|core dump\|segfault" /var/log/syslog | tail -10

Log Analysis Scripts

Comprehensive Log Analyzer

#!/bin/bash
# comprehensive-log-analyzer.sh - Daily log analysis report

REPORT_DATE=$(date +%Y-%m-%d)
REPORT_FILE="/var/log/analysis/report-$REPORT_DATE.txt"

mkdir -p /var/log/analysis

{
    echo "========================================="
    echo "System Log Analysis Report"
    echo "Date: $REPORT_DATE"
    echo "Hostname: $(hostname)"
    echo "========================================="
    echo ""

    echo "--- SECURITY ANALYSIS ---"
    echo ""
    echo "Failed SSH Login Attempts by IP:"
    sudo grep "Failed password" /var/log/auth.log 2>/dev/null | \
        awk '{print $(NF-3)}' | sort | uniq -c | sort -rn | head -10
    echo ""

    echo "Sudo Command Usage:"
    sudo grep "sudo.*COMMAND" /var/log/auth.log 2>/dev/null | wc -l
    echo ""

    echo "--- ERROR ANALYSIS ---"
    echo ""
    echo "System Errors:"
    sudo grep -i "error" /var/log/syslog 2>/dev/null | wc -l
    echo ""

    echo "Critical Events:"
    sudo grep -i "critical\|crit" /var/log/syslog 2>/dev/null | tail -10
    echo ""

    echo "--- SERVICE STATUS ---"
    echo ""
    echo "Service Failures:"
    sudo grep -i "failed" /var/log/syslog 2>/dev/null | \
        grep -i "service\|systemd" | tail -10
    echo ""

    echo "Service Restarts:"
    sudo grep "systemd.*Started" /var/log/syslog 2>/dev/null | \
        wc -l
    echo ""

    echo "--- RESOURCE ISSUES ---"
    echo ""
    echo "Disk Space Warnings:"
    sudo grep -i "no space left\|disk full" /var/log/syslog 2>/dev/null | wc -l
    echo ""

    echo "Memory Issues:"
    sudo grep -i "out of memory\|OOM" /var/log/syslog 2>/dev/null | wc -l
    echo ""

    echo "--- WEB SERVER ANALYSIS ---"
    if [ -f /var/log/nginx/access.log ]; then
        echo ""
        echo "Total HTTP Requests Today:"
        sudo grep "$(date +%d/%b/%Y)" /var/log/nginx/access.log 2>/dev/null | wc -l
        echo ""

        echo "HTTP Status Code Distribution:"
        sudo grep "$(date +%d/%b/%Y)" /var/log/nginx/access.log 2>/dev/null | \
            awk '{print $9}' | sort | uniq -c | sort -rn
        echo ""

        echo "Top 10 Requesting IPs:"
        sudo grep "$(date +%d/%b/%Y)" /var/log/nginx/access.log 2>/dev/null | \
            awk '{print $1}' | sort | uniq -c | sort -rn | head -10
    fi

    echo ""
    echo "========================================="
    echo "Report Generated: $(date)"
    echo "========================================="

} > "$REPORT_FILE"

echo "Analysis report saved to: $REPORT_FILE"
cat "$REPORT_FILE"

Real-Time Log Monitor

#!/bin/bash
# realtime-monitor.sh - Monitor logs for critical events

# Colors for output
RED='\033[0;31m'
YELLOW='\033[1;33m'
GREEN='\033[0;32m'
NC='\033[0m' # No Color

echo "Starting real-time log monitoring..."
echo "Press Ctrl+C to stop"
echo ""

# Monitor multiple logs simultaneously
sudo tail -f /var/log/syslog /var/log/auth.log 2>/dev/null | while read line; do
    # Check for critical patterns
    if echo "$line" | grep -qi "error\|critical\|failed\|failure"; then
        echo -e "${RED}[ERROR]${NC} $line"
    elif echo "$line" | grep -qi "warning\|warn"; then
        echo -e "${YELLOW}[WARN]${NC} $line"
    elif echo "$line" | grep -qi "failed password\|authentication failure"; then
        echo -e "${RED}[SECURITY]${NC} $line"
    elif echo "$line" | grep -qi "started\|stopped"; then
        echo -e "${GREEN}[SERVICE]${NC} $line"
    else
        echo "$line"
    fi
done

Log Retention Checker

#!/bin/bash
# log-retention-check.sh - Check log file sizes and retention

echo "=== Log File Size Analysis ==="
echo ""

# Find largest log files
echo "Top 10 Largest Log Files:"
sudo find /var/log -type f -exec du -h {} \; 2>/dev/null | \
    sort -rh | head -10
echo ""

# Find old log files
echo "Log Files Older Than 30 Days:"
sudo find /var/log -type f -mtime +30 -exec ls -lh {} \; 2>/dev/null
echo ""

# Check total log directory size
echo "Total /var/log Directory Size:"
sudo du -sh /var/log
echo ""

# Check available disk space
echo "Available Disk Space on /var:"
df -h /var | tail -1

Alerting Based on Log Analysis

Email Alerts for Critical Events

#!/bin/bash
# log-alert.sh - Send email alerts for critical log events

EMAIL="[email protected]"
HOSTNAME=$(hostname)
TEMP_FILE="/tmp/log-alert-$$.txt"

# Check for critical errors
CRITICAL_ERRORS=$(sudo grep -i "critical\|emerg\|alert" /var/log/syslog 2>/dev/null | tail -10)

if [ -n "$CRITICAL_ERRORS" ]; then
    {
        echo "Critical errors detected on $HOSTNAME"
        echo "Time: $(date)"
        echo ""
        echo "Recent Critical Events:"
        echo "$CRITICAL_ERRORS"
    } > "$TEMP_FILE"

    mail -s "ALERT: Critical Errors on $HOSTNAME" "$EMAIL" < "$TEMP_FILE"
    rm -f "$TEMP_FILE"
fi

# Check for failed login attempts
FAILED_LOGINS=$(sudo grep "Failed password" /var/log/auth.log 2>/dev/null | tail -10 | wc -l)

if [ "$FAILED_LOGINS" -gt 5 ]; then
    {
        echo "Multiple failed login attempts detected on $HOSTNAME"
        echo "Time: $(date)"
        echo "Count: $FAILED_LOGINS in last 10 entries"
        echo ""
        sudo grep "Failed password" /var/log/auth.log 2>/dev/null | tail -10
    } | mail -s "ALERT: Failed Login Attempts on $HOSTNAME" "$EMAIL"
fi

# Check for disk space issues
DISK_ERRORS=$(sudo grep -i "no space left\|disk full" /var/log/syslog 2>/dev/null | tail -5)

if [ -n "$DISK_ERRORS" ]; then
    {
        echo "Disk space issues detected on $HOSTNAME"
        echo "Time: $(date)"
        echo ""
        echo "Errors:"
        echo "$DISK_ERRORS"
        echo ""
        echo "Current Disk Usage:"
        df -h
    } | mail -s "ALERT: Disk Space Issue on $HOSTNAME" "$EMAIL"
fi

Syslog-Based Alerting

#!/bin/bash
# syslog-monitor.sh - Monitor syslog for specific patterns

ALERT_PATTERNS=(
    "error"
    "critical"
    "failed"
    "out of memory"
    "segfault"
    "authentication failure"
)

# Monitor syslog in real-time
sudo tail -f /var/log/syslog | while read line; do
    for pattern in "${ALERT_PATTERNS[@]}"; do
        if echo "$line" | grep -qi "$pattern"; then
            # Log to separate alert file
            echo "$(date): $line" >> /var/log/critical-alerts.log

            # Send to monitoring system (example: webhook)
            # curl -X POST -d "alert=$line" https://monitoring.example.com/webhook

            break
        fi
    done
done

Troubleshooting with Logs

Common Troubleshooting Scenarios

1. Service Won't Start

# Check service-specific logs
sudo journalctl -u service-name -n 50

# Check syslog for errors
sudo grep "service-name" /var/log/syslog | tail -20

# Check for dependency issues
sudo systemctl status service-name

2. SSH Connection Issues

# Check authentication logs
sudo grep "sshd" /var/log/auth.log | tail -30

# Look for specific connection attempts
sudo grep "Connection from" /var/log/auth.log | tail -20

# Check for denied connections
sudo grep "refused\|denied" /var/log/auth.log | tail -20

3. Web Server Not Responding

# Check error log for issues
sudo tail -50 /var/log/nginx/error.log

# Look for critical errors
sudo grep -i "crit\|emerg" /var/log/nginx/error.log | tail -20

# Check recent access patterns
sudo tail -100 /var/log/nginx/access.log

4. High System Load

# Check for OOM killer events
sudo grep -i "out of memory" /var/log/syslog

# Look for CPU-intensive processes in logs
sudo grep -i "cpu\|load" /var/log/syslog | tail -20

# Check kernel messages
dmesg | grep -i "error\|fail" | tail -20

5. Disk I/O Problems

# Check for I/O errors
sudo grep -i "I/O error" /var/log/kern.log

# Look for disk-related issues
sudo grep -i "disk\|ata\|sd[a-z]" /var/log/syslog | tail -30

# Check SMART errors (if available)
sudo grep -i "smart" /var/log/syslog

Log Management Best Practices

Log Rotation

Ensure logs are rotated to prevent disk space issues:

# Check logrotate configuration
cat /etc/logrotate.conf

# Test logrotate configuration
sudo logrotate -d /etc/logrotate.conf

# Force log rotation
sudo logrotate -f /etc/logrotate.conf

Log Compression

# Compress old logs manually
sudo gzip /var/log/syslog.1

# Find and compress old logs
sudo find /var/log -type f -name "*.log.1" -exec gzip {} \;

Log Archival

#!/bin/bash
# archive-logs.sh - Archive logs older than 30 days

ARCHIVE_DIR="/var/log/archive"
ARCHIVE_DATE=$(date -d "30 days ago" +%Y%m%d)

mkdir -p "$ARCHIVE_DIR"

# Find and archive old logs
sudo find /var/log -type f -name "*.log.*" -mtime +30 -exec mv {} "$ARCHIVE_DIR/" \;

# Compress archived logs
sudo tar -czf "$ARCHIVE_DIR/logs-$ARCHIVE_DATE.tar.gz" -C "$ARCHIVE_DIR" . --remove-files

echo "Logs archived to $ARCHIVE_DIR/logs-$ARCHIVE_DATE.tar.gz"

Conclusion

Effective log analysis is a cornerstone of successful Linux system administration, security monitoring, and troubleshooting. The /var/log/ directory contains a wealth of information that, when properly analyzed, provides deep insights into system behavior, security events, and application performance.

Key takeaways from this guide:

  1. Understanding log structure - Know where different types of events are logged and what each log file contains
  2. Command-line tools - Master grep, awk, sed, tail, and journalctl for efficient log analysis
  3. Pattern recognition - Identify common log patterns for security incidents, errors, and performance issues
  4. Automation - Create scripts to automate routine log analysis and alerting
  5. Proactive monitoring - Implement real-time monitoring and alerting for critical events
  6. Log management - Maintain proper log rotation, retention, and archival strategies

Regular log analysis should be part of your daily system administration routine. By developing a systematic approach to log review, you'll catch issues early, respond quickly to security incidents, and maintain a healthy, well-performing infrastructure.

Remember that logs are only valuable if they're reviewed and acted upon. Establish a regular log review schedule, automate common analysis tasks, and integrate log monitoring into your overall observability strategy to maximize the value of your system logs.