Log Analysis with awk, grep, and sed
Introduction
Log analysis is a fundamental skill for system administrators, DevOps engineers, and security professionals. While graphical log analysis tools offer powerful features, command-line utilities like awk, grep, and sed provide unmatched speed, flexibility, and availability for real-time log analysis, troubleshooting, and pattern extraction directly on production servers.
These three utilities form the cornerstone of Unix text processing: grep excels at searching and filtering, sed specializes in stream editing and text transformation, and awk provides powerful data extraction and reporting capabilities. Together, they enable you to parse gigabytes of log data in seconds, extract actionable insights, identify patterns, and automate log processing tasks without installing additional software.
This comprehensive guide teaches you how to master log analysis using awk, grep, and sed, from basic filtering to advanced pattern matching, data extraction, statistical analysis, and automated reporting. You'll learn practical techniques for analyzing web server logs, system logs, application logs, and security logs, enabling rapid troubleshooting and deep operational insights.
Prerequisites
Before diving into log analysis with these tools, ensure you have:
- A Linux server or workstation (any distribution)
- Access to log files (typically in /var/log/)
- Root or sudo access for protected log files
- Basic understanding of regular expressions
- Familiarity with common log formats
Required Tools: All three utilities are pre-installed on virtually every Linux distribution:
- grep (GNU grep recommended)
- sed (GNU sed)
- awk (GNU awk/gawk)
Verify Installation:
grep --version
sed --version
awk --version
Understanding Log Formats
Common Log Formats
Syslog Format:
Jan 11 10:30:45 server1 sshd[1234]: Accepted password for user from 192.168.1.100 port 12345 ssh2
Apache Combined Format:
192.168.1.100 - - [11/Jan/2024:10:30:45 +0000] "GET /index.html HTTP/1.1" 200 1234 "https://example.com" "Mozilla/5.0"
Nginx Access Log:
192.168.1.100 - user [11/Jan/2024:10:30:45 +0000] "GET /api/v1/users HTTP/1.1" 200 567 "-" "curl/7.68.0"
JSON Application Log:
{"timestamp":"2024-01-11T10:30:45Z","level":"ERROR","message":"Database connection failed","user_id":123}
Mastering grep for Log Analysis
Basic grep Usage
Search for specific term:
# Find all error messages
grep "error" /var/log/syslog
# Case-insensitive search
grep -i "error" /var/log/syslog
# Search multiple files
grep "failed" /var/log/*.log
# Recursive search
grep -r "connection refused" /var/log/
Display context:
# Show 3 lines before match
grep -B 3 "error" /var/log/syslog
# Show 3 lines after match
grep -A 3 "error" /var/log/syslog
# Show 3 lines before and after
grep -C 3 "error" /var/log/syslog
# Show line numbers
grep -n "error" /var/log/syslog
Count and statistics:
# Count matching lines
grep -c "error" /var/log/syslog
# Show only matching part
grep -o "ERROR.*" /var/log/app.log
# List files containing match
grep -l "error" /var/log/*.log
# List files NOT containing match
grep -L "error" /var/log/*.log
Advanced grep Patterns
Regular expression patterns:
# Match IP addresses
grep -E '\b([0-9]{1,3}\.){3}[0-9]{1,3}\b' /var/log/syslog
# Match email addresses
grep -E '\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b' /var/log/mail.log
# Match dates (YYYY-MM-DD format)
grep -E '[0-9]{4}-[0-9]{2}-[0-9]{2}' /var/log/app.log
# Match times (HH:MM:SS)
grep -E '[0-9]{2}:[0-9]{2}:[0-9]{2}' /var/log/syslog
Multiple patterns:
# Match either pattern (OR)
grep -E "error|warning|critical" /var/log/syslog
# Match multiple patterns (AND)
grep "error" /var/log/syslog | grep "database"
# Exclude pattern
grep "error" /var/log/syslog | grep -v "debug"
# Match word boundaries
grep -w "error" /var/log/syslog # Won't match "errors"
Practical grep Examples
Find failed SSH login attempts:
grep "Failed password" /var/log/auth.log
# With IP addresses
grep "Failed password" /var/log/auth.log | grep -oE '\b([0-9]{1,3}\.){3}[0-9]{1,3}\b' | sort | uniq -c | sort -rn
# Count by user
grep "Failed password" /var/log/auth.log | awk '{print $(NF-5)}' | sort | uniq -c | sort -rn
Analyze HTTP error codes:
# Find 404 errors
grep " 404 " /var/log/nginx/access.log
# Find 5xx errors
grep -E " 5[0-9]{2} " /var/log/nginx/access.log
# Count errors by type
grep -oE " [4-5][0-9]{2} " /var/log/nginx/access.log | sort | uniq -c | sort -rn
Search application errors with context:
# Find errors with stack traces
grep -A 20 "Exception" /var/log/app/error.log
# Find database errors
grep -B 5 -A 10 "SQLException" /var/log/app/app.log
Time-based filtering:
# Find logs from specific hour
grep "Jan 11 14:" /var/log/syslog
# Find logs from specific date
grep "2024-01-11" /var/log/app.log
# Find logs from today
grep "$(date '+%b %d')" /var/log/syslog
Mastering sed for Log Processing
Basic sed Usage
Print specific lines:
# Print line 10
sed -n '10p' /var/log/syslog
# Print lines 10-20
sed -n '10,20p' /var/log/syslog
# Print every 10th line
sed -n '0~10p' /var/log/syslog
# Print last line
sed -n '$p' /var/log/syslog
Delete lines:
# Delete blank lines
sed '/^$/d' /var/log/syslog
# Delete lines containing pattern
sed '/debug/d' /var/log/syslog
# Delete lines 1-10
sed '1,10d' /var/log/syslog
# Delete last line
sed '$d' /var/log/syslog
Substitute text:
# Replace first occurrence
sed 's/error/ERROR/' /var/log/app.log
# Replace all occurrences (global)
sed 's/error/ERROR/g' /var/log/app.log
# Case-insensitive replacement
sed 's/error/ERROR/gi' /var/log/app.log
# Replace only on lines matching pattern
sed '/WARNING/s/error/ERROR/g' /var/log/app.log
Advanced sed Patterns
Extract specific fields:
# Extract IP addresses
sed -n 's/.*\([0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\).*/\1/p' /var/log/nginx/access.log
# Extract timestamps
sed -n 's/.*\[\([^]]*\)\].*/\1/p' /var/log/nginx/access.log
# Remove timestamps (first 3 fields)
sed 's/^[^ ]* [^ ]* [^ ]* //' /var/log/syslog
Multi-line operations:
# Join lines ending with backslash
sed -e :a -e '/\\$/N; s/\\\n//; ta' /var/log/app.log
# Add line after pattern
sed '/ERROR/a\--- Error detected ---' /var/log/app.log
# Insert line before pattern
sed '/ERROR/i\--- Warning: Error follows ---' /var/log/app.log
Conditional processing:
# Process only lines between patterns
sed -n '/START/,/END/p' /var/log/app.log
# Delete everything after first ERROR
sed '/ERROR/,$d' /var/log/app.log
# Replace only in specific line range
sed '10,20s/old/new/g' /var/log/app.log
Practical sed Examples
Clean and format logs:
# Remove ANSI color codes
sed 's/\x1b\[[0-9;]*m//g' /var/log/app.log
# Remove carriage returns
sed 's/\r$//' /var/log/app.log
# Normalize whitespace
sed 's/[[:space:]]\+/ /g' /var/log/app.log
# Add prefix to each line
sed 's/^/[APP] /' /var/log/app.log
Extract and transform data:
# Convert Apache log to CSV
sed 's/\([^ ]*\) - - \[\([^]]*\)\] "\([^"]*\)" \([0-9]*\) \([0-9]*\).*/\1,\2,\3,\4,\5/' /var/log/apache2/access.log
# Extract just URLs from access log
sed -n 's/.*"\w* \([^ ]*\) HTTP.*/\1/p' /var/log/nginx/access.log
# Extract error messages only
sed -n 's/.*ERROR - \(.*\)$/\1/p' /var/log/app.log
Filter by time range:
# Extract logs from 10:00 to 11:00
sed -n '/Jan 11 10:00/,/Jan 11 11:00/p' /var/log/syslog
# Extract logs from specific date
sed -n '/2024-01-11/,/2024-01-12/p' /var/log/app.log
Mastering awk for Log Analysis
Basic awk Usage
Print specific columns:
# Print first column
awk '{print $1}' /var/log/syslog
# Print first and fifth columns
awk '{print $1, $5}' /var/log/syslog
# Print all columns except first
awk '{$1=""; print $0}' /var/log/syslog
# Print last column
awk '{print $NF}' /var/log/syslog
# Print second to last column
awk '{print $(NF-1)}' /var/log/syslog
Pattern matching:
# Print lines matching pattern
awk '/error/ {print}' /var/log/syslog
# Print lines NOT matching pattern
awk '!/debug/ {print}' /var/log/syslog
# Print if column matches
awk '$5 == "error" {print}' /var/log/syslog
# Print if column contains
awk '$5 ~ /error/ {print}' /var/log/syslog
Field separators:
# Custom field separator (colon)
awk -F':' '{print $1, $2}' /etc/passwd
# Multiple field separators
awk -F'[: ]' '{print $1}' /var/log/syslog
# Change output separator
awk -F':' 'BEGIN{OFS=","} {print $1, $2}' /etc/passwd
Advanced awk Operations
Arithmetic and statistics:
# Count lines
awk 'END {print NR}' /var/log/syslog
# Sum column values
awk '{sum+=$1} END {print sum}' numbers.log
# Calculate average
awk '{sum+=$1; count++} END {print sum/count}' numbers.log
# Find minimum and maximum
awk 'NR==1{max=$1; min=$1} $1>max{max=$1} $1<min{min=$1} END {print "Min:", min, "Max:", max}' numbers.log
Conditional processing:
# If-else statements
awk '{if ($1 > 100) print "HIGH:", $0; else print "LOW:", $0}' numbers.log
# Multiple conditions
awk '{if ($1 > 100 && $2 == "error") print $0}' /var/log/app.log
# Ternary operator
awk '{print ($1 > 100) ? "HIGH" : "LOW"}' numbers.log
Arrays and aggregation:
# Count occurrences
awk '{count[$1]++} END {for (ip in count) print ip, count[ip]}' /var/log/nginx/access.log
# Sum by key
awk '{sum[$1]+=$2} END {for (key in sum) print key, sum[key]}' data.log
# Track unique values
awk '{if (!seen[$1]++) print $1}' /var/log/access.log
BEGIN and END blocks:
# Print header and footer
awk 'BEGIN {print "=== Log Analysis ==="} {print $0} END {print "=== Total Lines:", NR, "==="}' /var/log/syslog
# Initialize variables
awk 'BEGIN {count=0} /error/ {count++} END {print "Errors:", count}' /var/log/syslog
Practical awk Examples
Apache/Nginx log analysis:
# Count requests by IP
awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -10
# Count by HTTP status code
awk '{print $9}' /var/log/nginx/access.log | sort | uniq -c | sort -rn
# Calculate total bandwidth
awk '{sum+=$10} END {print "Total MB:", sum/1024/1024}' /var/log/nginx/access.log
# Average response time
awk '{sum+=$NF; count++} END {print "Avg:", sum/count}' /var/log/nginx/access.log
# Requests per hour
awk '{print substr($4,14,2)}' /var/log/nginx/access.log | sort | uniq -c
# Top requested URLs
awk '{print $7}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -20
# 404 errors with URLs
awk '$9 == 404 {print $7}' /var/log/nginx/access.log | sort | uniq -c | sort -rn
# Response time percentiles
awk '{print $NF}' /var/log/nginx/access.log | sort -n | awk 'BEGIN{c=0} {a[c++]=$1} END {print "50th:", a[int(c*0.5)], "90th:", a[int(c*0.9)], "99th:", a[int(c*0.99)]}'
Syslog analysis:
# Count messages by hour
awk '{print $3}' /var/log/syslog | cut -d: -f1 | sort | uniq -c
# Count by program/service
awk '{print $5}' /var/log/syslog | sed 's/\[.*\]://' | sort | uniq -c | sort -rn
# Failed services
awk '/failed|error/ {print $5}' /var/log/syslog | sort | uniq -c | sort -rn
# Extract just error messages
awk '/error|ERROR/ {for(i=6;i<=NF;i++) printf "%s ", $i; print ""}' /var/log/syslog
Authentication log analysis:
# Failed login attempts by IP
awk '/Failed password/ {print $(NF-3)}' /var/log/auth.log | sort | uniq -c | sort -rn
# Failed login attempts by user
awk '/Failed password/ {print $(NF-5)}' /var/log/auth.log | sort | uniq -c | sort -rn
# Successful logins
awk '/Accepted password/ {print $(NF-3), $(NF-5)}' /var/log/auth.log
# Count auth events by hour
awk '{print $3}' /var/log/auth.log | cut -d: -f1 | sort | uniq -c
# Sudo command usage
awk '/sudo.*COMMAND/ {for(i=1;i<=NF;i++) if($i=="COMMAND=") {for(j=i+1;j<=NF;j++) printf "%s ", $j; print ""}}' /var/log/auth.log
Application log analysis (JSON):
# Extract specific JSON field (requires jq alternative in awk)
awk -F'"' '/timestamp/ {print $4}' /var/log/app.json
# Count by log level
awk -F'"' '/"level"/ {print $4}' /var/log/app.json | sort | uniq -c
# Errors with message
awk -F'"' '/"level":"ERROR"/ {for(i=1;i<=NF;i++) if($i=="message") print $(i+2)}' /var/log/app.json
Combining grep, sed, and awk
Powerful Pipeline Examples
Complete access log analysis:
# Top IPs accessing specific URL
grep "/api/login" /var/log/nginx/access.log | \
awk '{print $1}' | \
sort | uniq -c | sort -rn | head -10
# Extract and analyze 404 errors
grep " 404 " /var/log/nginx/access.log | \
awk '{print $7}' | \
sort | uniq -c | sort -rn | \
sed 's/^ *//' | \
awk '{print $2, ":", $1, "times"}'
# Analyze slow requests (response time > 1s)
awk '$NF > 1.0 {print $0}' /var/log/nginx/access.log | \
sed 's/.*"\([A-Z]*\) \([^ ]*\) .*/\1 \2/' | \
sort | uniq -c | sort -rn
Security analysis:
# Identify brute force attempts
grep "Failed password" /var/log/auth.log | \
awk '{print $(NF-3)}' | \
sort | uniq -c | \
awk '$1 > 10 {print "WARNING:", $2, "attempted", $1, "times"}' | \
sed 's/^/[SECURITY] /'
# Analyze unauthorized access attempts
grep -E "unauthorized|forbidden|denied" /var/log/syslog | \
awk '{print $5}' | \
sed 's/\[.*\]://' | \
sort | uniq -c | sort -rn
# Extract suspicious commands
grep "sudo" /var/log/auth.log | \
awk '/COMMAND/ {for(i=1;i<=NF;i++) if($i=="COMMAND=") print $(i+1)}' | \
grep -vE "^/usr/bin/(ls|cat|less|grep)" | \
sort | uniq -c
Performance analysis:
# Database query performance
grep "Query took" /var/log/app/app.log | \
sed 's/.*Query took \([0-9.]*\)ms.*/\1/' | \
awk '{sum+=$1; count++; if($1>max) max=$1} END {print "Avg:", sum/count, "ms, Max:", max, "ms, Total queries:", count}'
# Error rate over time
grep "ERROR" /var/log/app/app.log | \
awk '{print substr($1,12,5)}' | \
uniq -c | \
awk '{print $2, $1}' | \
sed 's/^/Time: /' | \
sed 's/ / - Errors: /'
Automated Log Analysis Scripts
Comprehensive Analysis Script
#!/bin/bash
# log-analyzer.sh - Automated log analysis
LOG_FILE="/var/log/nginx/access.log"
REPORT_FILE="/tmp/log-analysis-$(date +%Y%m%d-%H%M).txt"
{
echo "========================================="
echo "Log Analysis Report"
echo "Date: $(date)"
echo "Log File: $LOG_FILE"
echo "========================================="
echo ""
echo "--- Total Requests ---"
wc -l < "$LOG_FILE"
echo ""
echo "--- Top 10 IP Addresses ---"
awk '{print $1}' "$LOG_FILE" | sort | uniq -c | sort -rn | head -10
echo ""
echo "--- HTTP Status Code Distribution ---"
awk '{print $9}' "$LOG_FILE" | sort | uniq -c | sort -rn
echo ""
echo "--- Top 20 Requested URLs ---"
awk '{print $7}' "$LOG_FILE" | sort | uniq -c | sort -rn | head -20
echo ""
echo "--- 404 Errors ---"
grep " 404 " "$LOG_FILE" | awk '{print $7}' | sort | uniq -c | sort -rn | head -10
echo ""
echo "--- 5xx Errors ---"
grep -E " 5[0-9]{2} " "$LOG_FILE" | wc -l
if [ $(grep -cE " 5[0-9]{2} " "$LOG_FILE") -gt 0 ]; then
grep -E " 5[0-9]{2} " "$LOG_FILE" | awk '{print $7}' | sort | uniq -c | sort -rn | head -10
fi
echo ""
echo "--- Requests per Hour ---"
awk '{print substr($4,14,2)}' "$LOG_FILE" | sort | uniq -c
echo ""
echo "--- User Agents (Top 10) ---"
awk -F'"' '{print $6}' "$LOG_FILE" | sort | uniq -c | sort -rn | head -10
echo ""
echo "========================================="
echo "Report generated: $(date)"
echo "========================================="
} > "$REPORT_FILE"
echo "Analysis complete. Report saved to: $REPORT_FILE"
cat "$REPORT_FILE"
Real-Time Log Monitoring
#!/bin/bash
# realtime-monitor.sh - Real-time log monitoring with analysis
LOG_FILE="/var/log/syslog"
echo "Monitoring $LOG_FILE for errors..."
echo "Press Ctrl+C to stop"
echo ""
tail -f "$LOG_FILE" | while read line; do
# Check for errors
if echo "$line" | grep -qi "error"; then
echo "[ERROR] $line" | sed 's/error/\x1b[31mERROR\x1b[0m/i'
fi
# Check for warnings
if echo "$line" | grep -qi "warning"; then
echo "[WARN] $line" | sed 's/warning/\x1b[33mWARNING\x1b[0m/i'
fi
# Check for failed authentication
if echo "$line" | grep -q "Failed password"; then
IP=$(echo "$line" | awk '{print $(NF-3)}')
echo "[SECURITY] Failed login from $IP" | sed 's/SECURITY/\x1b[35mSECURITY\x1b[0m/'
fi
done
Security Audit Script
#!/bin/bash
# security-audit.sh - Automated security log analysis
AUTH_LOG="/var/log/auth.log"
REPORT="/tmp/security-audit-$(date +%Y%m%d).txt"
{
echo "Security Audit Report"
echo "====================="
echo "Date: $(date)"
echo ""
echo "--- Failed Login Attempts by IP ---"
grep "Failed password" "$AUTH_LOG" | \
awk '{print $(NF-3)}' | \
sort | uniq -c | sort -rn | \
awk '$1 > 5 {print "WARNING:", $2, "failed", $1, "times"}'
echo ""
echo "--- Failed Login Attempts by User ---"
grep "Failed password" "$AUTH_LOG" | \
awk '{print $(NF-5)}' | \
sort | uniq -c | sort -rn
echo ""
echo "--- Successful Root Logins ---"
grep "Accepted.*root" "$AUTH_LOG" | wc -l
if [ $(grep -c "Accepted.*root" "$AUTH_LOG") -gt 0 ]; then
grep "Accepted.*root" "$AUTH_LOG"
fi
echo ""
echo "--- Sudo Commands ---"
grep "sudo.*COMMAND" "$AUTH_LOG" | \
awk '{for(i=1;i<=NF;i++) if($i=="USER=") print $(i+1)}' | \
sort | uniq -c
echo ""
echo "--- New User Additions ---"
grep "useradd" "$AUTH_LOG"
echo ""
} > "$REPORT"
echo "Security audit complete. Report: $REPORT"
cat "$REPORT"
Best Practices
Performance Optimization
For large files:
# Use grep first to filter, then process
grep "ERROR" huge.log | awk '{print $5}'
# Process compressed files without decompressing
zgrep "pattern" file.log.gz
zcat file.log.gz | awk '{print $1}'
# Use mawk for better performance with large datasets
mawk '{print $1}' huge.log
# Limit processing with head
grep "ERROR" huge.log | head -1000 | awk '{print $5}'
Memory-efficient processing:
# Don't load entire file into memory
awk 'NR % 1000 == 0 {print "Processed", NR, "lines"}' huge.log
# Use stream processing
tail -f /var/log/app.log | grep --line-buffered "ERROR" | awk '{print $0}'
Regular Expression Tips
Common patterns:
# IP address: \b([0-9]{1,3}\.){3}[0-9]{1,3}\b
# Email: [A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}
# URL: https?://[^\s]+
# UUID: [0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}
# Date (YYYY-MM-DD): [0-9]{4}-[0-9]{2}-[0-9]{2}
# Time (HH:MM:SS): [0-9]{2}:[0-9]{2}:[0-9]{2}
Conclusion
Mastering grep, sed, and awk for log analysis provides powerful, flexible, and efficient tools for extracting insights from system and application logs. These utilities are fast, universally available, and capable of processing gigabytes of log data with minimal resource overhead.
Key takeaways:
- grep - Fast pattern searching and filtering
- sed - Stream editing and text transformation
- awk - Data extraction, analysis, and reporting
- Pipelines - Combine all three for powerful analysis workflows
- Automation - Script common analysis tasks for regular execution
Best practices:
- Start with grep to filter large datasets
- Use awk for structured data extraction and statistics
- Apply sed for text transformation and cleanup
- Combine tools in pipelines for complex analysis
- Test patterns on small data samples first
- Document complex one-liners for future reference
- Consider performance with large log files
- Automate routine analysis with scripts
While modern log analysis platforms offer advanced features, command-line tools remain indispensable for quick troubleshooting, ad-hoc analysis, and situations where installing additional software isn't feasible. These fundamental skills translate across all Unix-like systems and will serve you throughout your career in system administration and DevOps.


