Kubernetes Horizontal Pod Autoscaler

The Horizontal Pod Autoscaler (HPA) automatically scales the number of pod replicas based on observed metrics like CPU usage and memory consumption. This guide covers installing metrics-server, configuring CPU and memory-based scaling, using custom metrics, implementing sophisticated scaling behaviors, and comparing with Vertical Pod Autoscaler (VPA).

Table of Contents

HPA Fundamentals

How HPA Works

HPA continuously monitors metrics and adjusts replica count:

  1. Metrics Server collects resource metrics from Kubelet
  2. HPA controller queries metrics
  3. Calculates desired replica count based on metric targets
  4. Adjusts deployment/statefulset replica count
  5. Waits before scaling down to prevent oscillation

Scaling Formula

desiredReplicas = ceil[currentReplicas * (currentMetricValue / desiredMetricValue)]

Prerequisites

  • Kubernetes cluster v1.16+ (v1.18+ recommended)
  • Resource requests defined for workloads
  • Metrics Server installed
  • Sufficient cluster resources for scaling

Metrics Server Installation

Installing Metrics Server

Check if metrics-server is already installed:

kubectl get deployment -n kube-system metrics-server

Install Metrics Server:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

For kube-adm or self-hosted clusters with self-signed certificates:

kubectl patch deployment metrics-server -n kube-system --type="json" -p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-insecure-tls"}]'

Wait for metrics-server to be ready:

kubectl wait --for=condition=ready pod -l k8s-app=metrics-server -n kube-system --timeout=300s

Verify metrics are available:

kubectl top nodes
kubectl top pods -A

Metrics Server Configuration

Custom values for different environments:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: metrics-server
  namespace: kube-system
spec:
  template:
    spec:
      containers:
      - name: metrics-server
        image: registry.k8s.io/metrics-server/metrics-server:v0.6.4
        args:
        - --cert-dir=/tmp
        - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
        - --kubelet-insecure-tls=true
        - --metric-resolution=15s
        - --v=2
        resources:
          requests:
            cpu: 100m
            memory: 200Mi
          limits:
            cpu: 200m
            memory: 500Mi

CPU and Memory Scaling

Basic CPU-Based HPA

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: simple-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Memory-Based Scaling

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: memory-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: cache-server
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

Combined CPU and Memory Scaling

Scale when EITHER metric is exceeded:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: combined-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: app-server
  minReplicas: 3
  maxReplicas: 30
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 75
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 85
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 100
        periodSeconds: 30
      - type: Pods
        value: 2
        periodSeconds: 60
      selectPolicy: Max

Absolute Value Scaling

Scale based on absolute resource amounts:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: absolute-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: worker
  minReplicas: 1
  maxReplicas: 50
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: AverageValue
        averageValue: "500m"
  - type: Resource
    resource:
      name: memory
      target:
        type: AverageValue
        averageValue: "512Mi"

Custom Metrics

Prometheus Custom Metrics

Install Prometheus and custom metrics adapter:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

helm install kube-prometheus prometheus-community/kube-prometheus-stack \
  -n monitoring \
  --create-namespace

Install custom metrics API:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus-adapter prometheus-community/prometheus-adapter \
  -n monitoring \
  -f adapter-values.yaml

Adapter configuration:

# adapter-values.yaml
prometheus:
  url: http://prometheus-operated:9090

rules:
  custom:
  - seriesQuery: 'http_requests_per_second'
    resources:
      template: <<.Resource>>
    name:
      matches: "^(.*)_per_second"
      as: "${1}_rate"
    metricsQuery: 'rate(<<.Series>>{<<.LabelMatchers>>}[1m])'
  
  - seriesQuery: 'custom_app_metric'
    resources:
      template: <<.Resource>>
    name:
      matches: "^custom_(.+)_metric"
      as: "custom_${1}"
    metricsQuery: '<<.Series>>{<<.LabelMatchers>>}'

HPA with Custom Metrics

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: custom-metric-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "1k"
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 80

Application Metrics Example

Custom metric from application:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: queue-depth-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: message-processor
  minReplicas: 1
  maxReplicas: 50
  metrics:
  - type: Pods
    pods:
      metric:
        name: queue_depth
        selector:
          matchLabels:
            metric_type: queue
      target:
        type: AverageValue
        averageValue: "30"
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30
      policies:
      - type: Percent
        value: 200
        periodSeconds: 15
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60

Scaling Behavior

Scale-Up Behavior

Control how quickly HPA scales up:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: aggressive-scale-up
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 2
  maxReplicas: 100
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30
      policies:
      - type: Percent
        value: 100
        periodSeconds: 30
      - type: Pods
        value: 5
        periodSeconds: 30
      selectPolicy: Max

Scale-Down Behavior

Prevent flapping with scale-down stabilization:

behavior:
  scaleDown:
    stabilizationWindowSeconds: 300
    policies:
    - type: Percent
      value: 50
      periodSeconds: 60
    - type: Pods
      value: 2
      periodSeconds: 120
    selectPolicy: Min

Policy Descriptions

behavior:
  scaleUp:
    stabilizationWindowSeconds: 60
    policies:
    # Scale up by 100% (double replicas) every 30 seconds
    - type: Percent
      value: 100
      periodSeconds: 30
    # Or add 4 pods every 60 seconds
    - type: Pods
      value: 4
      periodSeconds: 60
    # Choose the policy that scales up the fastest
    selectPolicy: Max
  scaleDown:
    stabilizationWindowSeconds: 300
    policies:
    # Scale down by 25% every 60 seconds
    - type: Percent
      value: 25
      periodSeconds: 60
    # Or remove 2 pods every 120 seconds
    - type: Pods
      value: 2
      periodSeconds: 120
    # Choose the policy that scales down the slowest
    selectPolicy: Min

VPA Comparison

Vertical Pod Autoscaler

VPA adjusts CPU/memory requests and limits rather than replica count.

Install VPA:

git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh

Or using Helm:

helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm install vpa fairwinds-stable/vpa \
  -n kube-system \
  --create-namespace

VPA vs HPA

AspectHPAVPA
Scaling TypeHorizontal (replicas)Vertical (resources)
Use CaseLoad balancingRight-sizing
Latency ImpactPod restart on scale-upPod restart on recommendation
Best ForStateless servicesStateful services
Combined UseWorks well togetherWorks well together

Using VPA for Right-Sizing

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: app-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: web-app
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: "app"
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 2
        memory: 4Gi

Combining HPA and VPA

Use both for optimal scaling:

---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 75
---
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: "app"
      minAllowed:
        cpu: 250m
        memory: 512Mi
      maxAllowed:
        cpu: 2
        memory: 2Gi

Practical Examples

Example: Web Server HPA

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx
  minReplicas: 3
  maxReplicas: 50
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 85
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30
      policies:
      - type: Percent
        value: 100
        periodSeconds: 30
      - type: Pods
        value: 5
        periodSeconds: 30
      selectPolicy: Max
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
      selectPolicy: Min

Example: Batch Job Scaler

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: batch-worker-hpa
  namespace: jobs
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: batch-worker
  minReplicas: 1
  maxReplicas: 100
  metrics:
  - type: Pods
    pods:
      metric:
        name: pending_jobs
      target:
        type: AverageValue
        averageValue: "5"
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 200
        periodSeconds: 15
    scaleDown:
      stabilizationWindowSeconds: 600
      policies:
      - type: Pods
        value: 1
        periodSeconds: 120

Troubleshooting

Check HPA Status

# View HPA status
kubectl get hpa -n production
kubectl describe hpa web-hpa -n production

# Check events
kubectl get events -n production --sort-by='.lastTimestamp'

View Current Metrics

kubectl get hpa -n production -o custom-columns=NAME:.metadata.name,REFERENCE:.spec.scaleTargetRef.kind,TARGETS:.status.currentMetrics,MINPODS:.spec.minReplicas,MAXPODS:.spec.maxReplicas,REPLICAS:.status.currentReplicas

Debugging

# Check if metrics are available
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | jq .

# View HPA decision history
kubectl describe hpa web-hpa -n production

# Check controller logs
kubectl logs -n kube-system deployment/metrics-server
kubectl logs -n kube-system deployment/horizontal-pod-autoscaler

Conclusion

The Horizontal Pod Autoscaler is essential for running efficient, cost-effective Kubernetes deployments on VPS and baremetal infrastructure. By properly configuring resource requests, implementing appropriate metrics and scaling policies, and combining HPA with VPA for comprehensive scaling, you create workloads that automatically adapt to demand. Start with simple CPU-based scaling and gradually add custom metrics and sophisticated behavior policies as your needs evolve. Regular monitoring and tuning of HPA parameters ensure optimal performance and cost efficiency.