Kubernetes Horizontal Pod Autoscaler

The Horizontal Pod Autoscaler (HPA) automatically scales the number of pod replicas based on observed metrics like CPU usage and memory consumption. Esta guía cubre installing metrics-server, configuring CPU and memory-based scaling, using custom metrics, implementing sophisticated scaling behaviors, and comparing with Vertical Pod Autoscaler (VPA).

Tabla de contenidos

HPA Fundamentals

How HPA Works

HPA continuously monitors metrics and adjusts replica count:

  1. Metrics Server collects resource metrics from Kubelet
  2. HPA controller queries metrics
  3. Calculates desired replica count based on metric targets
  4. Adjusts implementación/statefulset replica count
  5. Waits before scaling down to prevent oscillation

Escalado Formula

desiredReplicas = ceil[currentReplicas * (currentMetricValue / desiredMetricValue)]

Requisitos previos

  • Kubernetes clúster v1.16+ (v1.18+ recommended)
  • Resource requests defined for workloads
  • Metrics Server installed
  • Sufficient clúster resources for scaling

Metrics Server Instalaation

Instalaing Metrics Server

Verifica si metrics-server is already installed:

kubectl get deployment -n kube-system metrics-server

Instala Metrics Server:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

For kube-adm or self-hosted clústers with self-signed certificates:

kubectl patch deployment metrics-server -n kube-system --type="json" -p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-insecure-tls"}]'

Wait for metrics-server to be ready:

kubectl wait --for=condition=ready pod -l k8s-app=metrics-server -n kube-system --timeout=300s

Verifica metrics are available:

kubectl top nodes
kubectl top pods -A

Metrics Server Configuration

Custom values for different environments:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: metrics-server
  namespace: kube-system
spec:
  template:
    spec:
      containers:
      - name: metrics-server
        image: registry.k8s.io/metrics-server/metrics-server:v0.6.4
        args:
        - --cert-dir=/tmp
        - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
        - --kubelet-insecure-tls=true
        - --metric-resolution=15s
        - --v=2
        resources:
          requests:
            cpu: 100m
            memory: 200Mi
          limits:
            cpu: 200m
            memory: 500Mi

CPU and Memory Scaling

Basic CPU-Based HPA

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: simple-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Memory-Based Scaling

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: memory-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: cache-server
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

Combined CPU and Memory Scaling

Scale when EITHER metric is exceeded:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: combined-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: app-server
  minReplicas: 3
  maxReplicas: 30
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 75
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 85
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 100
        periodSeconds: 30
      - type: Pods
        value: 2
        periodSeconds: 60
      selectPolicy: Max

Absolute Value Scaling

Scale based on absolute resource amounts:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: absolute-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: worker
  minReplicas: 1
  maxReplicas: 50
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: AverageValue
        averageValue: "500m"
  - type: Resource
    resource:
      name: memory
      target:
        type: AverageValue
        averageValue: "512Mi"

Custom Metrics

Prometheus Custom Metrics

Instala Prometheus and custom metrics adapter:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

helm install kube-prometheus prometheus-community/kube-prometheus-stack \
  -n monitoring \
  --create-namespace

Instala custom metrics API:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus-adapter prometheus-community/prometheus-adapter \
  -n monitoring \
  -f adapter-values.yaml

Adapter configuration:

# adapter-values.yaml
prometheus:
  url: http://prometheus-operated:9090

rules:
  custom:
  - seriesQuery: 'http_requests_per_second'
    resources:
      template: <<.Resource>>
    name:
      matches: "^(.*)_per_second"
      as: "${1}_rate"
    metricsQuery: 'rate(<<.Series>>{<<.LabelMatchers>>}[1m])'
  
  - seriesQuery: 'custom_app_metric'
    resources:
      template: <<.Resource>>
    name:
      matches: "^custom_(.+)_metric"
      as: "custom_${1}"
    metricsQuery: '<<.Series>>{<<.LabelMatchers>>}'

HPA with Custom Metrics

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: custom-metric-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "1k"
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 80

Application Metrics Example

Custom metric from application:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: queue-depth-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: message-processor
  minReplicas: 1
  maxReplicas: 50
  metrics:
  - type: Pods
    pods:
      metric:
        name: queue_depth
        selector:
          matchLabels:
            metric_type: queue
      target:
        type: AverageValue
        averageValue: "30"
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30
      policies:
      - type: Percent
        value: 200
        periodSeconds: 15
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60

Escalado Behavior

Scale-Up Behavior

Control how quickly HPA scales up:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: aggressive-scale-up
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 2
  maxReplicas: 100
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30
      policies:
      - type: Percent
        value: 100
        periodSeconds: 30
      - type: Pods
        value: 5
        periodSeconds: 30
      selectPolicy: Max

Scale-Down Behavior

Prevent flapping with scale-down stabilization:

behavior:
  scaleDown:
    stabilizationWindowSeconds: 300
    policies:
    - type: Percent
      value: 50
      periodSeconds: 60
    - type: Pods
      value: 2
      periodSeconds: 120
    selectPolicy: Min

Policy Descriptions

behavior:
  scaleUp:
    stabilizationWindowSeconds: 60
    policies:
    # Scale up by 100% (double replicas) every 30 seconds
    - type: Percent
      value: 100
      periodSeconds: 30
    # Or add 4 pods every 60 seconds
    - type: Pods
      value: 4
      periodSeconds: 60
    # Choose the policy that scales up the fastest
    selectPolicy: Max
  scaleDown:
    stabilizationWindowSeconds: 300
    policies:
    # Scale down by 25% every 60 seconds
    - type: Percent
      value: 25
      periodSeconds: 60
    # Or remove 2 pods every 120 seconds
    - type: Pods
      value: 2
      periodSeconds: 120
    # Choose the policy that scales down the slowest
    selectPolicy: Min

VPA Comparison

Vertical Pod Autoscaler

VPA adjusts CPU/memory requests and limits rather than replica count.

Instala VPA:

git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh

Or using Helm:

helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm install vpa fairwinds-stable/vpa \
  -n kube-system \
  --create-namespace

VPA vs HPA

AspectHPAVPA
Scaling TypeHorizontal (replicas)Vertical (resources)
Use CaseLoad balancingRight-sizing
Latency ImpactPod restart on scale-upPod restart on recommendation
Best ForStateless serviciosStateful servicios
Combined UseWorks well togetherWorks well together

Using VPA for Right-Sizing

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: app-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: web-app
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: "app"
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 2
        memory: 4Gi

Combining HPA and VPA

Use both for optimal scaling:

---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 75
---
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: "app"
      minAllowed:
        cpu: 250m
        memory: 512Mi
      maxAllowed:
        cpu: 2
        memory: 2Gi

Practical Examples

Ejemplo: Web Server HPA

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx
  minReplicas: 3
  maxReplicas: 50
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 85
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30
      policies:
      - type: Percent
        value: 100
        periodSeconds: 30
      - type: Pods
        value: 5
        periodSeconds: 30
      selectPolicy: Max
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
      selectPolicy: Min

Ejemplo: Batch Job Scaler

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: batch-worker-hpa
  namespace: jobs
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: batch-worker
  minReplicas: 1
  maxReplicas: 100
  metrics:
  - type: Pods
    pods:
      metric:
        name: pending_jobs
      target:
        type: AverageValue
        averageValue: "5"
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 200
        periodSeconds: 15
    scaleDown:
      stabilizationWindowSeconds: 600
      policies:
      - type: Pods
        value: 1
        periodSeconds: 120

Solución de problemas

Check HPA Status

# View HPA status
kubectl get hpa -n production
kubectl describe hpa web-hpa -n production

# Check events
kubectl get events -n production --sort-by='.lastTimestamp'

View Current Metrics

kubectl get hpa -n production -o custom-columns=NAME:.metadata.name,REFERENCE:.spec.scaleTargetRef.kind,TARGETS:.status.currentMetrics,MINPODS:.spec.minReplicas,MAXPODS:.spec.maxReplicas,REPLICAS:.status.currentReplicas

Debugging

# Check if metrics are available
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | jq .

# View HPA decision history
kubectl describe hpa web-hpa -n production

# Check controller logs
kubectl logs -n kube-system deployment/metrics-server
kubectl logs -n kube-system deployment/horizontal-pod-autoscaler

Conclusión

The Horizontal Pod Autoscaler is essential for running efficient, cost-effective Kubernetes implementacións on VPS and baremetal infrastructure. By properly configuring resource requests, implementing appropriate metrics and scaling policies, and combining HPA with VPA for comprehensive scaling, you create workloads that automatically adapt to demand. Start with simple CPU-based scaling and gradually add custom metrics and sophisticated behavior policies as your needs evolve. Regular monitoreo and tuning of HPA parameters ensure optimal rendimiento and cost efficiency.