Kubernetes Resource Limits and Requests Optimization

Proper resource configuration ensures efficient cluster utilization, prevents pod eviction, and maintains application performance. This guide covers Quality of Service (QoS) classes, LimitRanges, ResourceQuotas, right-sizing strategies, out-of-memory handling, and eviction policies for your VPS and baremetal Kubernetes infrastructure.

Table of Contents

Resource Management Fundamentals

Requests vs Limits

Request: Minimum guaranteed resources

  • Used for scheduling decisions
  • Pod won't be scheduled if cluster lacks requested resources
  • Guaranteed to be available

Limit: Maximum resources allowed

  • Pod terminated if exceeding memory limit
  • CPU is throttled if limit exceeded
  • Prevents resource hogging

Resource Types

CPU:

  • Measured in cores
  • 1000m = 1 core
  • Can be fractional (500m = 0.5 cores)

Memory:

  • Measured in bytes
  • Suffixes: Ki, Mi, Gi, etc.
  • 1Gi = 1024Mi = 1048576Ki

Requests and Limits

Basic Configuration

apiVersion: v1
kind: Pod
metadata:
  name: resource-pod
spec:
  containers:
  - name: app
    image: myapp:1.0
    resources:
      requests:
        cpu: 250m
        memory: 512Mi
      limits:
        cpu: 500m
        memory: 1Gi

Per-Container Resources

Multiple containers with different resource needs:

apiVersion: v1
kind: Pod
metadata:
  name: multi-container
spec:
  containers:
  - name: app
    image: myapp:1.0
    resources:
      requests:
        cpu: 500m
        memory: 512Mi
      limits:
        cpu: 1
        memory: 1Gi
  - name: sidecar
    image: sidecar:1.0
    resources:
      requests:
        cpu: 100m
        memory: 128Mi
      limits:
        cpu: 200m
        memory: 256Mi

Resource Allocation in Deployments

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: web
        image: web:1.0
        resources:
          requests:
            cpu: 200m
            memory: 256Mi
          limits:
            cpu: 500m
            memory: 512Mi

Quality of Service Classes

QoS Classes

Kubernetes automatically assigns QoS classes based on requests/limits:

Guaranteed: Requests = Limits (highest priority)

resources:
  requests:
    cpu: 500m
    memory: 512Mi
  limits:
    cpu: 500m
    memory: 512Mi

Burstable: Requests < Limits (medium priority)

resources:
  requests:
    cpu: 250m
    memory: 256Mi
  limits:
    cpu: 500m
    memory: 512Mi

BestEffort: No requests/limits (lowest priority)

resources: {}

Eviction Order

When cluster runs out of resources, pods are evicted in this order:

  1. BestEffort pods
  2. Burstable pods exceeding requests
  3. Guaranteed pods (if very critical)

Checking QoS Class

kubectl get pods -o custom-columns=NAME:.metadata.name,QOS:.status.qosClass
kubectl describe pod myapp | grep QoS

LimitRange

LimitRange Fundamentals

LimitRange enforces resource constraints at namespace level.

Pod-Level LimitRange

apiVersion: v1
kind: LimitRange
metadata:
  name: pod-limits
  namespace: production
spec:
  limits:
  - type: Pod
    max:
      cpu: "2"
      memory: "2Gi"
    min:
      cpu: "100m"
      memory: "128Mi"
    maxLimitRequestRatio:
      cpu: "4"
      memory: "2"

Container-Level LimitRange

apiVersion: v1
kind: LimitRange
metadata:
  name: container-limits
  namespace: production
spec:
  limits:
  - type: Container
    max:
      cpu: "1"
      memory: "1Gi"
    min:
      cpu: "50m"
      memory: "64Mi"
    default:
      cpu: "500m"
      memory: "512Mi"
    defaultRequest:
      cpu: "100m"
      memory: "128Mi"
    maxLimitRequestRatio:
      cpu: "2"
      memory: "1.5"

Viewing LimitRange

kubectl get limitrange -n production
kubectl describe limitrange container-limits -n production

ResourceQuota

ResourceQuota Fundamentals

ResourceQuota limits total resources consumed in a namespace.

Basic ResourceQuota

apiVersion: v1
kind: ResourceQuota
metadata:
  name: production-quota
  namespace: production
spec:
  hard:
    requests.cpu: "10"
    requests.memory: "20Gi"
    limits.cpu: "20"
    limits.memory: "40Gi"
    pods: "100"

Advanced ResourceQuota

apiVersion: v1
kind: ResourceQuota
metadata:
  name: comprehensive-quota
  namespace: production
spec:
  hard:
    # Compute resources
    requests.cpu: "50"
    requests.memory: "100Gi"
    limits.cpu: "100"
    limits.memory: "200Gi"
    
    # Pod count
    pods: "200"
    
    # Object counts
    services: "10"
    services.loadbalancers: "2"
    services.nodeports: "5"
    persistentvolumeclaims: "10"
    configmaps: "50"
    secrets: "50"
    
    # Storage
    requests.storage: "500Gi"
  
  scopeSelector:
    matchExpressions:
    - operator: In
      scopeName: PriorityClass
      values: ["high", "medium"]

Quota with Priority Classes

apiVersion: v1
kind: ResourceQuota
metadata:
  name: priority-quota
  namespace: production
spec:
  hard:
    requests.cpu: "10"
    limits.cpu: "20"
  scopeSelector:
    matchExpressions:
    - operator: In
      scopeName: PriorityClass
      values: ["production"]

Viewing ResourceQuota

kubectl get resourcequota -n production
kubectl describe resourcequota production-quota -n production
kubectl top nodes
kubectl top pods -n production

Right-Sizing

Identifying Resource Usage

Get current usage:

# Node usage
kubectl top nodes

# Pod usage
kubectl top pods -n production

# Container usage in details
kubectl get pods -n production -o json | jq '.items[] | {name: .metadata.name, containers: .spec.containers[] | {name, resources}}'

Using Metrics

Prometheus queries for usage analysis:

# CPU usage rate
sum(rate(container_cpu_usage_seconds_total[5m])) by (pod_name)

# Memory usage
sum(container_memory_working_set_bytes) by (pod_name)

# CPU request utilization
sum(rate(container_cpu_usage_seconds_total[5m])) by (pod_name) / 
sum(kube_pod_container_resource_requests_cpu_cores) by (pod_name)

Right-Sizing Strategy

  1. Monitor: Collect metrics for 2-4 weeks
  2. Analyze: Identify P95/P99 usage
  3. Set Request: To P50-P75 of usage
  4. Set Limit: To P99 or 2x request
  5. Validate: Monitor and adjust

Example Right-Sizing

Based on metrics:

  • P50: 100m CPU, 256Mi memory
  • P95: 300m CPU, 512Mi memory
  • P99: 500m CPU, 800Mi memory

Set as:

  • Request: 200m / 300Mi
  • Limit: 500m / 800Mi
resources:
  requests:
    cpu: 200m
    memory: 300Mi
  limits:
    cpu: 500m
    memory: 800Mi

Troubleshooting

Out of Memory (OOM) Errors

# Check if pod was OOMKilled
kubectl describe pod myapp | grep -A 5 "Last State"

# View events
kubectl get events -A --sort-by='.lastTimestamp' | grep OOMKilled

# Check memory limit in pod
kubectl get pod myapp -o yaml | grep -A 5 "memory:"

Solutions:

  1. Increase limit
  2. Optimize application
  3. Scale horizontally

Pending Pods

# Check why pod can't be scheduled
kubectl describe pod pending-pod

# View node resources
kubectl describe nodes

# Check ResourceQuota usage
kubectl describe resourcequota -n production

CPU Throttling

# Check if CPU is being throttled
kubectl get pods -o custom-columns=NAME:.metadata.name,CPU-REQUESTS:.spec.containers[*].resources.requests.cpu

# Prometheus query for throttling
increase(container_cpu_cfs_throttled_seconds_total[5m])

Practical Examples

Example: Web Application Right-Sizing

---
# LimitRange for web namespace
apiVersion: v1
kind: LimitRange
metadata:
  name: web-limits
  namespace: web
spec:
  limits:
  - type: Container
    max:
      cpu: "2"
      memory: "2Gi"
    min:
      cpu: "100m"
      memory: "128Mi"
    defaultRequest:
      cpu: "200m"
      memory: "256Mi"
    default:
      cpu: "500m"
      memory: "512Mi"
---
# ResourceQuota for web namespace
apiVersion: v1
kind: ResourceQuota
metadata:
  name: web-quota
  namespace: web
spec:
  hard:
    requests.cpu: "20"
    requests.memory: "40Gi"
    limits.cpu: "40"
    limits.memory: "80Gi"
    pods: "50"
---
# Web application with optimized resources
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web
  namespace: web
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: web
        image: web:1.0
        resources:
          requests:
            cpu: 200m
            memory: 256Mi
          limits:
            cpu: 500m
            memory: 512Mi
      - name: cache
        image: redis:7
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 200m
            memory: 256Mi

Example: Database Pod with Guaranteed QoS

apiVersion: v1
kind: Pod
metadata:
  name: database
  namespace: databases
spec:
  containers:
  - name: postgres
    image: postgres:15
    resources:
      requests:
        cpu: 2
        memory: 4Gi
      limits:
        cpu: 2
        memory: 4Gi
  - name: backup
    image: backup-tool:1.0
    resources:
      requests:
        cpu: 500m
        memory: 512Mi
      limits:
        cpu: 500m
        memory: 512Mi

Example: Multi-Tier ResourceQuota

---
# CPU-intensive workloads quota
apiVersion: v1
kind: ResourceQuota
metadata:
  name: compute-intensive-quota
  namespace: batch
spec:
  hard:
    requests.cpu: "100"
    limits.cpu: "200"
    requests.memory: "100Gi"
    limits.memory: "200Gi"
  scopeSelector:
    matchExpressions:
    - operator: In
      scopeName: PriorityClass
      values: ["compute-intensive"]
---
# Standard workloads quota
apiVersion: v1
kind: ResourceQuota
metadata:
  name: standard-quota
  namespace: production
spec:
  hard:
    requests.cpu: "50"
    limits.cpu: "100"
    requests.memory: "50Gi"
    limits.memory: "100Gi"
  scopeSelector:
    matchExpressions:
    - operator: NotIn
      scopeName: PriorityClass
      values: ["compute-intensive"]

Conclusion

Proper resource management is critical for stable, efficient Kubernetes clusters. By implementing appropriate requests and limits, using QoS classes strategically, enforcing LimitRanges and ResourceQuotas, and regularly right-sizing based on actual usage, you optimize cluster utilization and prevent resource contention. Start with conservative estimates, monitor actual usage, and adjust based on metrics. Regular review of resource allocation ensures your applications run reliably on your VPS and baremetal Kubernetes infrastructure without unnecessary waste or performance degradation.