Rate Limiting Strategies for APIs

Rate limiting controls request frequency from clients, protecting APIs from abuse, ensuring fair resource allocation, and preventing cascading failures. Different algorithms and implementations offer varying levels of precision, performance, and operational complexity. This guide covers token bucket and sliding window algorithms, Nginx and HAProxy rate limiting, request header analysis, per-client limits, and response header standards.

Table of Contents

  1. Rate Limiting Overview
  2. Token Bucket Algorithm
  3. Sliding Window Algorithm
  4. Leaky Bucket Algorithm
  5. Nginx Rate Limiting
  6. HAProxy Rate Limiting
  7. Request Header Analysis
  8. Per-Client Rate Limits
  9. Rate Limit Response Headers
  10. Distributed Rate Limiting
  11. Testing and Monitoring

Rate Limiting Overview

Rate limiting strategies:

  1. Per-IP: Limit by client IP address
  2. Per-User: Limit by authenticated user
  3. Per-API-Key: Limit by API key
  4. Per-Endpoint: Different limits for different endpoints
  5. Distributed: Shared state across multiple servers

Common algorithms:

  • Token Bucket: Accumulate tokens, consume one per request
  • Sliding Window: Count requests in time window
  • Leaky Bucket: Queue requests, leak at fixed rate
  • Fixed Window: Simple counter per time period

Benefits:

  • Prevents API abuse and DDoS attacks
  • Ensures fair resource usage
  • Protects backend infrastructure
  • Improves service stability

Costs:

  • Added latency (especially distributed)
  • Complexity in implementation
  • Storage overhead for tracking
  • Potential for false positives

Token Bucket Algorithm

Token bucket maintains a bucket of tokens, consuming one per request:

Algorithm:
1. Start with N tokens
2. Each second, add R tokens (up to max N)
3. Each request consumes 1 token
4. Reject requests when bucket empty
5. Unused tokens accumulate (burst capacity)

Advantages:

  • Allows burst traffic (buffer of tokens)
  • Simple to implement
  • Predictable behavior
  • Low CPU overhead

Example: 100 requests/sec with 50 burst capacity

Initial: 50 tokens
Second 1: 50 + 100 = 150 (capped at 150) → 49 tokens after requests
Second 2: 49 + 100 = 149 (capped at 150) → 48 tokens after requests

Sliding Window Algorithm

Sliding window counts requests in a moving time window:

Algorithm:
1. Maintain count of requests in last T seconds
2. For new request, check if count < limit
3. Add request timestamp
4. Remove timestamps older than T seconds
5. Allow if count < limit

Advantages:

  • Fair rate limiting (not fixed windows)
  • No burst capacity (strict limits)
  • Prevents edge case behavior

Disadvantages:

  • Higher memory usage
  • More CPU intensive
  • Requires precise timestamp tracking

Example: 100 requests per 60 seconds

Time 0.00s: Request 1 → [0.00] → count=1 ✓
Time 0.05s: Request 2 → [0.00, 0.05] → count=2 ✓
...
Time 1.00s: Request 100 → [0.00, 0.05, ..., 1.00] → count=100 ✓
Time 1.01s: Request 101 → Old timestamps removed, count < 100 ✓
Time 0.05s (removed): count drops, newer requests allowed

Leaky Bucket Algorithm

Leaky bucket queues requests and processes at fixed rate:

Algorithm:
1. Queue incoming requests
2. Process requests from queue at fixed rate R
3. If queue full, reject new requests
4. Smooths traffic bursts into constant stream

Advantages:

  • Smooth output rate
  • Predictable resource usage
  • Prevents burst overload

Disadvantages:

  • Higher latency for initial requests
  • Queue management overhead
  • Not suitable for bursty workloads

Nginx Rate Limiting

Nginx provides token bucket rate limiting:

Basic Rate Limiting

# Define rate limit zones
limit_req_zone $binary_remote_addr zone=api_limit:10m rate=100r/s;
limit_req_zone $http_x_api_key zone=api_key_limit:10m rate=1000r/s;

server {
    listen 80;
    server_name api.example.com;
    
    # Apply rate limit to entire API
    limit_req zone=api_limit burst=50 nodelay;
    
    location /api {
        proxy_pass http://backend;
    }
}

Parameters:

  • $binary_remote_addr: Client IP (binary representation)
  • zone=name:size: Zone name and size (10m = 10 megabytes)
  • rate=100r/s: 100 requests per second
  • burst=50: Allow 50 burst requests (queue)
  • nodelay: Don't delay burst requests

Per-Endpoint Rate Limiting

limit_req_zone $binary_remote_addr zone=strict_limit:10m rate=10r/s;
limit_req_zone $binary_remote_addr zone=normal_limit:10m rate=100r/s;

server {
    listen 80;
    server_name api.example.com;
    
    # Strict limit for expensive endpoints
    location /api/expensive {
        limit_req zone=strict_limit burst=5;
        proxy_pass http://backend;
    }
    
    # Normal limit for standard endpoints
    location /api/standard {
        limit_req zone=normal_limit burst=20;
        proxy_pass http://backend;
    }
    
    # High limit for fast endpoints
    location /api/fast {
        # No rate limit
        proxy_pass http://backend;
    }
}

Rate Limiting by API Key

# Extract API key from header
map $http_x_api_key $api_client {
    default "unknown";
    "~^key_(.+)$" $1;
}

limit_req_zone $api_client zone=api_key_limit:10m rate=1000r/s;

server {
    listen 80;
    server_name api.example.com;
    
    location /api {
        # Check for valid API key
        if ($http_x_api_key = "") {
            return 401 '{"error": "Missing API key"}';
        }
        
        limit_req zone=api_key_limit burst=100 nodelay;
        proxy_pass http://backend;
        proxy_set_header X-API-Key $http_x_api_key;
    }
}

Whitelist Exclusions

geo $whitelist {
    default 0;
    10.0.0.0/8 1;
    192.168.0.0/16 1;
    203.0.113.0/24 1;  # Partner network
}

limit_req_zone $binary_remote_addr zone=api_limit:10m rate=100r/s;

server {
    listen 80;
    
    location /api {
        if ($whitelist = 1) {
            access_log off;
            # Skip rate limiting for whitelisted IPs
            proxy_pass http://backend;
        }
        
        limit_req zone=api_limit burst=50;
        proxy_pass http://backend;
    }
}

Dynamic Rate Limits

# Map user tier to rate limit
map $http_x_user_tier $rate_limit_zone {
    "premium" $binary_remote_addr:premium;
    "standard" $binary_remote_addr:standard;
    default $binary_remote_addr:free;
}

limit_req_zone $rate_limit_zone zone=premium_limit:10m rate=5000r/s;
limit_req_zone $rate_limit_zone zone=standard_limit:10m rate=500r/s;
limit_req_zone $rate_limit_zone zone=free_limit:10m rate=50r/s;

server {
    listen 80;
    server_name api.example.com;
    
    location /api {
        if ($http_x_user_tier = "premium") {
            limit_req zone=premium_limit burst=500;
        }
        if ($http_x_user_tier = "standard") {
            limit_req zone=standard_limit burst=50;
        }
        if ($http_x_user_tier = "free") {
            limit_req zone=free_limit burst=5;
        }
        
        proxy_pass http://backend;
    }
}

Custom Rate Limit Status Response

limit_req_zone $binary_remote_addr zone=api_limit:10m rate=100r/s;

server {
    listen 80;
    server_name api.example.com;
    
    # Handle rate limit errors
    error_page 429 = @rate_limit_error;
    
    location /api {
        limit_req zone=api_limit burst=50 nodelay;
        limit_req_status 429;
        proxy_pass http://backend;
    }
    
    location @rate_limit_error {
        default_type application/json;
        return 429 '{"error": "Too many requests", "retry_after": 60}';
    }
}

HAProxy Rate Limiting

HAProxy uses stick tables for distributed rate limiting:

Basic Rate Limiting

global
    stats socket /run/haproxy/admin.sock

defaults
    mode http
    timeout client 30s
    timeout server 30s

frontend api_in
    bind *:80
    
    # Stick table for rate limiting
    stick-table type ip size 100k expire 1h store http_req_rate(10s)
    
    # Track client IP
    http-request track-sc0 src
    
    # Limit to 100 requests per 10 seconds
    http-request deny if { sc_http_req_rate(0) gt 100 }
    
    default_backend api_servers

backend api_servers
    balance roundrobin
    server srv1 192.168.1.100:8000 check
    server srv2 192.168.1.101:8000 check

Per-API-Key Rate Limiting

frontend api_in
    bind *:80
    
    # Extract API key from header
    http-request set-var(req.api_key) req.hdr(X-API-Key)
    
    # Stick table tracking by API key
    stick-table type string len 64 size 100k expire 1h store http_req_rate(10s)
    http-request track-sc0 var(req.api_key)
    
    # Enforce limit
    http-request deny if { sc_http_req_rate(0) gt 1000 }
    
    default_backend api_servers

Tiered Rate Limiting

frontend api_in
    bind *:80
    
    stick-table type string len 64 size 100k expire 1h store http_req_rate(10s)
    http-request track-sc0 req.hdr(X-API-Key)
    
    # Different limits based on user tier
    acl is_premium_tier req.hdr(X-Tier) -i "premium"
    acl is_standard_tier req.hdr(X-Tier) -i "standard"
    
    http-request deny if is_premium_tier !{ sc_http_req_rate(0) lt 5000 }
    http-request deny if is_standard_tier !{ sc_http_req_rate(0) lt 500 }
    http-request deny if !is_premium_tier !is_standard_tier !{ sc_http_req_rate(0) lt 50 }
    
    default_backend api_servers

Rate Limiting with Exponential Backoff

frontend api_in
    bind *:80
    
    stick-table type ip size 100k expire 1h store http_req_rate(10s), gpc0
    http-request track-sc0 src
    
    # First tier: 100 req/s
    http-request deny if { sc_http_req_rate(0) gt 100 }
    
    # Mark repeat offenders
    http-request set-var(proc.offender) sc0_gpc0 if { sc_http_req_rate(0) gt 100 }
    http-request sc-inc-gpc0(0) if { sc_http_req_rate(0) gt 100 }
    
    # Higher penalty for repeat offenders
    http-request set-header X-Client-Warning "Rate-limited" if { sc0_gpc0 gt 3 }
    http-request deny if { sc0_gpc0 gt 10 }
    
    default_backend api_servers

Request Header Analysis

Analyze request headers for intelligent rate limiting:

# Rate limit based on User-Agent
map $http_user_agent $is_bot {
    default 0;
    ~*bot 1;
    ~*crawler 1;
    ~*spider 1;
}

limit_req_zone $binary_remote_addr zone=normal:10m rate=100r/s;
limit_req_zone $binary_remote_addr zone=bot:10m rate=10r/s;

server {
    listen 80;
    
    location /api {
        if ($is_bot = 1) {
            limit_req zone=bot burst=5;
        }
        if ($is_bot = 0) {
            limit_req zone=normal burst=50;
        }
        
        proxy_pass http://backend;
    }
}

Rate limit by hostname:

limit_req_zone $host zone=per_host:10m rate=100r/s;

server {
    listen 80;
    server_name api.example.com other.example.com;
    
    location / {
        limit_req zone=per_host burst=50;
        proxy_pass http://backend;
    }
}

Rate limit by request path:

map $request_uri $request_limit_zone {
    ~*/api/expensive strict_limit;
    ~*/api/standard normal_limit;
    default no_limit;
}

limit_req_zone $request_limit_zone zone=strict_limit:10m rate=10r/s;
limit_req_zone $request_limit_zone zone=normal_limit:10m rate=100r/s;

server {
    listen 80;
    location / {
        limit_req zone=$request_limit_zone;
        proxy_pass http://backend;
    }
}

Per-Client Rate Limits

Implement sophisticated per-client limits:

# Combine multiple factors
map "$binary_remote_addr:$http_x_user_id:$http_x_api_key" $rate_limit_key {
    default $binary_remote_addr;
    ~*^(.+):([^:]+):(.+)$ $3;  # Prefer API key
    ~*^(.+):([^:]+)$ $2;       # Then user ID
}

limit_req_zone $rate_limit_key zone=client_limit:10m rate=500r/s;

server {
    listen 80;
    
    location /api {
        limit_req zone=client_limit burst=100 nodelay;
        proxy_pass http://backend;
        
        # Pass identified client to backend
        proxy_set_header X-Client-ID $rate_limit_key;
    }
}

Rate Limit Response Headers

Include standard rate limit headers in responses:

# Add rate limit headers to responses
add_header X-RateLimit-Limit 100 always;
add_header X-RateLimit-Remaining $limit_req_status always;
add_header X-RateLimit-Reset $msec always;

# Custom response when rate limited
error_page 429 = @rate_limited;

location @rate_limited {
    default_type application/json;
    add_header X-RateLimit-Limit 100 always;
    add_header X-RateLimit-Remaining 0 always;
    add_header X-RateLimit-Reset 60 always;
    add_header Retry-After 60 always;
    
    return 429 '{
        "error": "Too Many Requests",
        "message": "Rate limit exceeded. Retry after 60 seconds.",
        "status": 429
    }';
}

HAProxy rate limit headers:

http-response set-header X-RateLimit-Limit "1000"
http-response set-header X-RateLimit-Remaining "999"
http-response set-header X-RateLimit-Reset "%T"

# When rate limited
http-response set-header X-RateLimit-Limit "100" if rate_limited
http-response set-header X-RateLimit-Remaining "0" if rate_limited
http-response set-header Retry-After "60" if rate_limited

Distributed Rate Limiting

Share rate limit state across multiple servers using Redis:

import redis
import time
from flask import Flask, request, jsonify

app = Flask(__name__)
r = redis.Redis(host='localhost', port=6379, db=0)

def check_rate_limit(client_id, limit=100, window=60):
    key = f"rate_limit:{client_id}"
    
    # Increment counter
    current = r.incr(key)
    
    # Set expiration on first request
    if current == 1:
        r.expire(key, window)
    
    # Check if exceeded
    if current > limit:
        return False, current - limit
    
    return True, limit - current

@app.route('/api/resource')
def get_resource():
    client_id = request.remote_addr
    allowed, remaining = check_rate_limit(client_id)
    
    if not allowed:
        return jsonify({'error': 'Rate limit exceeded'}), 429
    
    response = {'data': 'resource data'}
    response.headers['X-RateLimit-Remaining'] = str(remaining)
    return response

Testing and Monitoring

Test rate limiting:

# Simple rate limit test
for i in {1..150}; do
    curl -s -o /dev/null -w "%{http_code} " http://api.example.com/resource
    echo -n "$i "
done
echo

# Measure response time under load
ab -n 1000 -c 100 http://api.example.com/api/resource

# Check rate limit headers
curl -i http://api.example.com/api/resource | grep X-RateLimit

# Test with custom headers
for i in {1..5}; do
    curl -s -H "X-API-Key: mykey" http://api.example.com/api/resource | head -1
done

Monitor rate limiting:

# Check Nginx rate limit stats
grep "limiting requests" /var/log/nginx/error.log | wc -l

# Monitor HAProxy stick table
echo "show table api_limit" | socat - /run/haproxy/admin.sock

# Monitor Redis rate limits
redis-cli --scan --pattern "rate_limit:*"
redis-cli DBSIZE

Conclusion

Rate limiting is essential for API protection and fair resource allocation. Choose algorithms based on requirements: token bucket for bursty traffic, sliding window for strict limits, leaky bucket for smooth throughput. Implement per-client, per-endpoint, and tiered strategies with clear response headers. Distributed systems benefit from Redis-backed rate limiting for consistency. Monitor and adjust limits continuously based on actual usage patterns and system capacity.