Rate Limiting Strategies for APIs
Rate limiting controls request frequency from clients, protecting APIs from abuse, ensuring fair resource allocation, and preventing cascading failures. Different algorithms and implementations offer varying levels of precision, performance, and operational complexity. This guide covers token bucket and sliding window algorithms, Nginx and HAProxy rate limiting, request header analysis, per-client limits, and response header standards.
Table of Contents
- Rate Limiting Overview
- Token Bucket Algorithm
- Sliding Window Algorithm
- Leaky Bucket Algorithm
- Nginx Rate Limiting
- HAProxy Rate Limiting
- Request Header Analysis
- Per-Client Rate Limits
- Rate Limit Response Headers
- Distributed Rate Limiting
- Testing and Monitoring
Rate Limiting Overview
Rate limiting strategies:
- Per-IP: Limit by client IP address
- Per-User: Limit by authenticated user
- Per-API-Key: Limit by API key
- Per-Endpoint: Different limits for different endpoints
- Distributed: Shared state across multiple servers
Common algorithms:
- Token Bucket: Accumulate tokens, consume one per request
- Sliding Window: Count requests in time window
- Leaky Bucket: Queue requests, leak at fixed rate
- Fixed Window: Simple counter per time period
Benefits:
- Prevents API abuse and DDoS attacks
- Ensures fair resource usage
- Protects backend infrastructure
- Improves service stability
Costs:
- Added latency (especially distributed)
- Complexity in implementation
- Storage overhead for tracking
- Potential for false positives
Token Bucket Algorithm
Token bucket maintains a bucket of tokens, consuming one per request:
Algorithm:
1. Start with N tokens
2. Each second, add R tokens (up to max N)
3. Each request consumes 1 token
4. Reject requests when bucket empty
5. Unused tokens accumulate (burst capacity)
Advantages:
- Allows burst traffic (buffer of tokens)
- Simple to implement
- Predictable behavior
- Low CPU overhead
Example: 100 requests/sec with 50 burst capacity
Initial: 50 tokens
Second 1: 50 + 100 = 150 (capped at 150) → 49 tokens after requests
Second 2: 49 + 100 = 149 (capped at 150) → 48 tokens after requests
Sliding Window Algorithm
Sliding window counts requests in a moving time window:
Algorithm:
1. Maintain count of requests in last T seconds
2. For new request, check if count < limit
3. Add request timestamp
4. Remove timestamps older than T seconds
5. Allow if count < limit
Advantages:
- Fair rate limiting (not fixed windows)
- No burst capacity (strict limits)
- Prevents edge case behavior
Disadvantages:
- Higher memory usage
- More CPU intensive
- Requires precise timestamp tracking
Example: 100 requests per 60 seconds
Time 0.00s: Request 1 → [0.00] → count=1 ✓
Time 0.05s: Request 2 → [0.00, 0.05] → count=2 ✓
...
Time 1.00s: Request 100 → [0.00, 0.05, ..., 1.00] → count=100 ✓
Time 1.01s: Request 101 → Old timestamps removed, count < 100 ✓
Time 0.05s (removed): count drops, newer requests allowed
Leaky Bucket Algorithm
Leaky bucket queues requests and processes at fixed rate:
Algorithm:
1. Queue incoming requests
2. Process requests from queue at fixed rate R
3. If queue full, reject new requests
4. Smooths traffic bursts into constant stream
Advantages:
- Smooth output rate
- Predictable resource usage
- Prevents burst overload
Disadvantages:
- Higher latency for initial requests
- Queue management overhead
- Not suitable for bursty workloads
Nginx Rate Limiting
Nginx provides token bucket rate limiting:
Basic Rate Limiting
# Define rate limit zones
limit_req_zone $binary_remote_addr zone=api_limit:10m rate=100r/s;
limit_req_zone $http_x_api_key zone=api_key_limit:10m rate=1000r/s;
server {
listen 80;
server_name api.example.com;
# Apply rate limit to entire API
limit_req zone=api_limit burst=50 nodelay;
location /api {
proxy_pass http://backend;
}
}
Parameters:
$binary_remote_addr: Client IP (binary representation)zone=name:size: Zone name and size (10m = 10 megabytes)rate=100r/s: 100 requests per secondburst=50: Allow 50 burst requests (queue)nodelay: Don't delay burst requests
Per-Endpoint Rate Limiting
limit_req_zone $binary_remote_addr zone=strict_limit:10m rate=10r/s;
limit_req_zone $binary_remote_addr zone=normal_limit:10m rate=100r/s;
server {
listen 80;
server_name api.example.com;
# Strict limit for expensive endpoints
location /api/expensive {
limit_req zone=strict_limit burst=5;
proxy_pass http://backend;
}
# Normal limit for standard endpoints
location /api/standard {
limit_req zone=normal_limit burst=20;
proxy_pass http://backend;
}
# High limit for fast endpoints
location /api/fast {
# No rate limit
proxy_pass http://backend;
}
}
Rate Limiting by API Key
# Extract API key from header
map $http_x_api_key $api_client {
default "unknown";
"~^key_(.+)$" $1;
}
limit_req_zone $api_client zone=api_key_limit:10m rate=1000r/s;
server {
listen 80;
server_name api.example.com;
location /api {
# Check for valid API key
if ($http_x_api_key = "") {
return 401 '{"error": "Missing API key"}';
}
limit_req zone=api_key_limit burst=100 nodelay;
proxy_pass http://backend;
proxy_set_header X-API-Key $http_x_api_key;
}
}
Whitelist Exclusions
geo $whitelist {
default 0;
10.0.0.0/8 1;
192.168.0.0/16 1;
203.0.113.0/24 1; # Partner network
}
limit_req_zone $binary_remote_addr zone=api_limit:10m rate=100r/s;
server {
listen 80;
location /api {
if ($whitelist = 1) {
access_log off;
# Skip rate limiting for whitelisted IPs
proxy_pass http://backend;
}
limit_req zone=api_limit burst=50;
proxy_pass http://backend;
}
}
Dynamic Rate Limits
# Map user tier to rate limit
map $http_x_user_tier $rate_limit_zone {
"premium" $binary_remote_addr:premium;
"standard" $binary_remote_addr:standard;
default $binary_remote_addr:free;
}
limit_req_zone $rate_limit_zone zone=premium_limit:10m rate=5000r/s;
limit_req_zone $rate_limit_zone zone=standard_limit:10m rate=500r/s;
limit_req_zone $rate_limit_zone zone=free_limit:10m rate=50r/s;
server {
listen 80;
server_name api.example.com;
location /api {
if ($http_x_user_tier = "premium") {
limit_req zone=premium_limit burst=500;
}
if ($http_x_user_tier = "standard") {
limit_req zone=standard_limit burst=50;
}
if ($http_x_user_tier = "free") {
limit_req zone=free_limit burst=5;
}
proxy_pass http://backend;
}
}
Custom Rate Limit Status Response
limit_req_zone $binary_remote_addr zone=api_limit:10m rate=100r/s;
server {
listen 80;
server_name api.example.com;
# Handle rate limit errors
error_page 429 = @rate_limit_error;
location /api {
limit_req zone=api_limit burst=50 nodelay;
limit_req_status 429;
proxy_pass http://backend;
}
location @rate_limit_error {
default_type application/json;
return 429 '{"error": "Too many requests", "retry_after": 60}';
}
}
HAProxy Rate Limiting
HAProxy uses stick tables for distributed rate limiting:
Basic Rate Limiting
global
stats socket /run/haproxy/admin.sock
defaults
mode http
timeout client 30s
timeout server 30s
frontend api_in
bind *:80
# Stick table for rate limiting
stick-table type ip size 100k expire 1h store http_req_rate(10s)
# Track client IP
http-request track-sc0 src
# Limit to 100 requests per 10 seconds
http-request deny if { sc_http_req_rate(0) gt 100 }
default_backend api_servers
backend api_servers
balance roundrobin
server srv1 192.168.1.100:8000 check
server srv2 192.168.1.101:8000 check
Per-API-Key Rate Limiting
frontend api_in
bind *:80
# Extract API key from header
http-request set-var(req.api_key) req.hdr(X-API-Key)
# Stick table tracking by API key
stick-table type string len 64 size 100k expire 1h store http_req_rate(10s)
http-request track-sc0 var(req.api_key)
# Enforce limit
http-request deny if { sc_http_req_rate(0) gt 1000 }
default_backend api_servers
Tiered Rate Limiting
frontend api_in
bind *:80
stick-table type string len 64 size 100k expire 1h store http_req_rate(10s)
http-request track-sc0 req.hdr(X-API-Key)
# Different limits based on user tier
acl is_premium_tier req.hdr(X-Tier) -i "premium"
acl is_standard_tier req.hdr(X-Tier) -i "standard"
http-request deny if is_premium_tier !{ sc_http_req_rate(0) lt 5000 }
http-request deny if is_standard_tier !{ sc_http_req_rate(0) lt 500 }
http-request deny if !is_premium_tier !is_standard_tier !{ sc_http_req_rate(0) lt 50 }
default_backend api_servers
Rate Limiting with Exponential Backoff
frontend api_in
bind *:80
stick-table type ip size 100k expire 1h store http_req_rate(10s), gpc0
http-request track-sc0 src
# First tier: 100 req/s
http-request deny if { sc_http_req_rate(0) gt 100 }
# Mark repeat offenders
http-request set-var(proc.offender) sc0_gpc0 if { sc_http_req_rate(0) gt 100 }
http-request sc-inc-gpc0(0) if { sc_http_req_rate(0) gt 100 }
# Higher penalty for repeat offenders
http-request set-header X-Client-Warning "Rate-limited" if { sc0_gpc0 gt 3 }
http-request deny if { sc0_gpc0 gt 10 }
default_backend api_servers
Request Header Analysis
Analyze request headers for intelligent rate limiting:
# Rate limit based on User-Agent
map $http_user_agent $is_bot {
default 0;
~*bot 1;
~*crawler 1;
~*spider 1;
}
limit_req_zone $binary_remote_addr zone=normal:10m rate=100r/s;
limit_req_zone $binary_remote_addr zone=bot:10m rate=10r/s;
server {
listen 80;
location /api {
if ($is_bot = 1) {
limit_req zone=bot burst=5;
}
if ($is_bot = 0) {
limit_req zone=normal burst=50;
}
proxy_pass http://backend;
}
}
Rate limit by hostname:
limit_req_zone $host zone=per_host:10m rate=100r/s;
server {
listen 80;
server_name api.example.com other.example.com;
location / {
limit_req zone=per_host burst=50;
proxy_pass http://backend;
}
}
Rate limit by request path:
map $request_uri $request_limit_zone {
~*/api/expensive strict_limit;
~*/api/standard normal_limit;
default no_limit;
}
limit_req_zone $request_limit_zone zone=strict_limit:10m rate=10r/s;
limit_req_zone $request_limit_zone zone=normal_limit:10m rate=100r/s;
server {
listen 80;
location / {
limit_req zone=$request_limit_zone;
proxy_pass http://backend;
}
}
Per-Client Rate Limits
Implement sophisticated per-client limits:
# Combine multiple factors
map "$binary_remote_addr:$http_x_user_id:$http_x_api_key" $rate_limit_key {
default $binary_remote_addr;
~*^(.+):([^:]+):(.+)$ $3; # Prefer API key
~*^(.+):([^:]+)$ $2; # Then user ID
}
limit_req_zone $rate_limit_key zone=client_limit:10m rate=500r/s;
server {
listen 80;
location /api {
limit_req zone=client_limit burst=100 nodelay;
proxy_pass http://backend;
# Pass identified client to backend
proxy_set_header X-Client-ID $rate_limit_key;
}
}
Rate Limit Response Headers
Include standard rate limit headers in responses:
# Add rate limit headers to responses
add_header X-RateLimit-Limit 100 always;
add_header X-RateLimit-Remaining $limit_req_status always;
add_header X-RateLimit-Reset $msec always;
# Custom response when rate limited
error_page 429 = @rate_limited;
location @rate_limited {
default_type application/json;
add_header X-RateLimit-Limit 100 always;
add_header X-RateLimit-Remaining 0 always;
add_header X-RateLimit-Reset 60 always;
add_header Retry-After 60 always;
return 429 '{
"error": "Too Many Requests",
"message": "Rate limit exceeded. Retry after 60 seconds.",
"status": 429
}';
}
HAProxy rate limit headers:
http-response set-header X-RateLimit-Limit "1000"
http-response set-header X-RateLimit-Remaining "999"
http-response set-header X-RateLimit-Reset "%T"
# When rate limited
http-response set-header X-RateLimit-Limit "100" if rate_limited
http-response set-header X-RateLimit-Remaining "0" if rate_limited
http-response set-header Retry-After "60" if rate_limited
Distributed Rate Limiting
Share rate limit state across multiple servers using Redis:
import redis
import time
from flask import Flask, request, jsonify
app = Flask(__name__)
r = redis.Redis(host='localhost', port=6379, db=0)
def check_rate_limit(client_id, limit=100, window=60):
key = f"rate_limit:{client_id}"
# Increment counter
current = r.incr(key)
# Set expiration on first request
if current == 1:
r.expire(key, window)
# Check if exceeded
if current > limit:
return False, current - limit
return True, limit - current
@app.route('/api/resource')
def get_resource():
client_id = request.remote_addr
allowed, remaining = check_rate_limit(client_id)
if not allowed:
return jsonify({'error': 'Rate limit exceeded'}), 429
response = {'data': 'resource data'}
response.headers['X-RateLimit-Remaining'] = str(remaining)
return response
Testing and Monitoring
Test rate limiting:
# Simple rate limit test
for i in {1..150}; do
curl -s -o /dev/null -w "%{http_code} " http://api.example.com/resource
echo -n "$i "
done
echo
# Measure response time under load
ab -n 1000 -c 100 http://api.example.com/api/resource
# Check rate limit headers
curl -i http://api.example.com/api/resource | grep X-RateLimit
# Test with custom headers
for i in {1..5}; do
curl -s -H "X-API-Key: mykey" http://api.example.com/api/resource | head -1
done
Monitor rate limiting:
# Check Nginx rate limit stats
grep "limiting requests" /var/log/nginx/error.log | wc -l
# Monitor HAProxy stick table
echo "show table api_limit" | socat - /run/haproxy/admin.sock
# Monitor Redis rate limits
redis-cli --scan --pattern "rate_limit:*"
redis-cli DBSIZE
Conclusion
Rate limiting is essential for API protection and fair resource allocation. Choose algorithms based on requirements: token bucket for bursty traffic, sliding window for strict limits, leaky bucket for smooth throughput. Implement per-client, per-endpoint, and tiered strategies with clear response headers. Distributed systems benefit from Redis-backed rate limiting for consistency. Monitor and adjust limits continuously based on actual usage patterns and system capacity.


