Kubernetes Horizontal Pod Autoscaler
The Horizontal Pod Autoscaler (HPA) automatically scales the number of pod replicas based on observed metrics like CPU usage and memory consumption. This guide covers installing metrics-server, configuring CPU and memory-based scaling, using custom metrics, implementing sophisticated scaling behaviors, and comparing with Vertical Pod Autoscaler (VPA).
Table of Contents
- HPA Fundamentals
- Metrics Server Installation
- CPU and Memory Scaling
- Custom Metrics
- Scaling Behavior
- VPA Comparison
- Practical Examples
- Troubleshooting
- Conclusion
HPA Fundamentals
How HPA Works
HPA continuously monitors metrics and adjusts replica count:
- Metrics Server collects resource metrics from Kubelet
- HPA controller queries metrics
- Calculates desired replica count based on metric targets
- Adjusts deployment/statefulset replica count
- Waits before scaling down to prevent oscillation
Scaling Formula
desiredReplicas = ceil[currentReplicas * (currentMetricValue / desiredMetricValue)]
Prerequisites
- Kubernetes cluster v1.16+ (v1.18+ recommended)
- Resource requests defined for workloads
- Metrics Server installed
- Sufficient cluster resources for scaling
Metrics Server Installation
Installing Metrics Server
Check if metrics-server is already installed:
kubectl get deployment -n kube-system metrics-server
Install Metrics Server:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
For kube-adm or self-hosted clusters with self-signed certificates:
kubectl patch deployment metrics-server -n kube-system --type="json" -p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-insecure-tls"}]'
Wait for metrics-server to be ready:
kubectl wait --for=condition=ready pod -l k8s-app=metrics-server -n kube-system --timeout=300s
Verify metrics are available:
kubectl top nodes
kubectl top pods -A
Metrics Server Configuration
Custom values for different environments:
apiVersion: apps/v1
kind: Deployment
metadata:
name: metrics-server
namespace: kube-system
spec:
template:
spec:
containers:
- name: metrics-server
image: registry.k8s.io/metrics-server/metrics-server:v0.6.4
args:
- --cert-dir=/tmp
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-insecure-tls=true
- --metric-resolution=15s
- --v=2
resources:
requests:
cpu: 100m
memory: 200Mi
limits:
cpu: 200m
memory: 500Mi
CPU and Memory Scaling
Basic CPU-Based HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: simple-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Memory-Based Scaling
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: memory-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: cache-server
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
Combined CPU and Memory Scaling
Scale when EITHER metric is exceeded:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: combined-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: app-server
minReplicas: 3
maxReplicas: 30
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 75
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 85
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 100
periodSeconds: 30
- type: Pods
value: 2
periodSeconds: 60
selectPolicy: Max
Absolute Value Scaling
Scale based on absolute resource amounts:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: absolute-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: worker
minReplicas: 1
maxReplicas: 50
metrics:
- type: Resource
resource:
name: cpu
target:
type: AverageValue
averageValue: "500m"
- type: Resource
resource:
name: memory
target:
type: AverageValue
averageValue: "512Mi"
Custom Metrics
Prometheus Custom Metrics
Install Prometheus and custom metrics adapter:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install kube-prometheus prometheus-community/kube-prometheus-stack \
-n monitoring \
--create-namespace
Install custom metrics API:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus-adapter prometheus-community/prometheus-adapter \
-n monitoring \
-f adapter-values.yaml
Adapter configuration:
# adapter-values.yaml
prometheus:
url: http://prometheus-operated:9090
rules:
custom:
- seriesQuery: 'http_requests_per_second'
resources:
template: <<.Resource>>
name:
matches: "^(.*)_per_second"
as: "${1}_rate"
metricsQuery: 'rate(<<.Series>>{<<.LabelMatchers>>}[1m])'
- seriesQuery: 'custom_app_metric'
resources:
template: <<.Resource>>
name:
matches: "^custom_(.+)_metric"
as: "custom_${1}"
metricsQuery: '<<.Series>>{<<.LabelMatchers>>}'
HPA with Custom Metrics
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: custom-metric-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 2
maxReplicas: 20
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "1k"
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 80
Application Metrics Example
Custom metric from application:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: queue-depth-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: message-processor
minReplicas: 1
maxReplicas: 50
metrics:
- type: Pods
pods:
metric:
name: queue_depth
selector:
matchLabels:
metric_type: queue
target:
type: AverageValue
averageValue: "30"
behavior:
scaleUp:
stabilizationWindowSeconds: 30
policies:
- type: Percent
value: 200
periodSeconds: 15
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
Scaling Behavior
Scale-Up Behavior
Control how quickly HPA scales up:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: aggressive-scale-up
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api
minReplicas: 2
maxReplicas: 100
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
behavior:
scaleUp:
stabilizationWindowSeconds: 30
policies:
- type: Percent
value: 100
periodSeconds: 30
- type: Pods
value: 5
periodSeconds: 30
selectPolicy: Max
Scale-Down Behavior
Prevent flapping with scale-down stabilization:
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
- type: Pods
value: 2
periodSeconds: 120
selectPolicy: Min
Policy Descriptions
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
# Scale up by 100% (double replicas) every 30 seconds
- type: Percent
value: 100
periodSeconds: 30
# Or add 4 pods every 60 seconds
- type: Pods
value: 4
periodSeconds: 60
# Choose the policy that scales up the fastest
selectPolicy: Max
scaleDown:
stabilizationWindowSeconds: 300
policies:
# Scale down by 25% every 60 seconds
- type: Percent
value: 25
periodSeconds: 60
# Or remove 2 pods every 120 seconds
- type: Pods
value: 2
periodSeconds: 120
# Choose the policy that scales down the slowest
selectPolicy: Min
VPA Comparison
Vertical Pod Autoscaler
VPA adjusts CPU/memory requests and limits rather than replica count.
Install VPA:
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh
Or using Helm:
helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm install vpa fairwinds-stable/vpa \
-n kube-system \
--create-namespace
VPA vs HPA
| Aspect | HPA | VPA |
|---|---|---|
| Scaling Type | Horizontal (replicas) | Vertical (resources) |
| Use Case | Load balancing | Right-sizing |
| Latency Impact | Pod restart on scale-up | Pod restart on recommendation |
| Best For | Stateless services | Stateful services |
| Combined Use | Works well together | Works well together |
Using VPA for Right-Sizing
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: app-vpa
namespace: production
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: web-app
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: "app"
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 2
memory: 4Gi
Combining HPA and VPA
Use both for optimal scaling:
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 75
---
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-vpa
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: "app"
minAllowed:
cpu: 250m
memory: 512Mi
maxAllowed:
cpu: 2
memory: 2Gi
Practical Examples
Example: Web Server HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nginx
minReplicas: 3
maxReplicas: 50
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 85
behavior:
scaleUp:
stabilizationWindowSeconds: 30
policies:
- type: Percent
value: 100
periodSeconds: 30
- type: Pods
value: 5
periodSeconds: 30
selectPolicy: Max
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
selectPolicy: Min
Example: Batch Job Scaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: batch-worker-hpa
namespace: jobs
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: batch-worker
minReplicas: 1
maxReplicas: 100
metrics:
- type: Pods
pods:
metric:
name: pending_jobs
target:
type: AverageValue
averageValue: "5"
behavior:
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 200
periodSeconds: 15
scaleDown:
stabilizationWindowSeconds: 600
policies:
- type: Pods
value: 1
periodSeconds: 120
Troubleshooting
Check HPA Status
# View HPA status
kubectl get hpa -n production
kubectl describe hpa web-hpa -n production
# Check events
kubectl get events -n production --sort-by='.lastTimestamp'
View Current Metrics
kubectl get hpa -n production -o custom-columns=NAME:.metadata.name,REFERENCE:.spec.scaleTargetRef.kind,TARGETS:.status.currentMetrics,MINPODS:.spec.minReplicas,MAXPODS:.spec.maxReplicas,REPLICAS:.status.currentReplicas
Debugging
# Check if metrics are available
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | jq .
# View HPA decision history
kubectl describe hpa web-hpa -n production
# Check controller logs
kubectl logs -n kube-system deployment/metrics-server
kubectl logs -n kube-system deployment/horizontal-pod-autoscaler
Conclusion
The Horizontal Pod Autoscaler is essential for running efficient, cost-effective Kubernetes deployments on VPS and baremetal infrastructure. By properly configuring resource requests, implementing appropriate metrics and scaling policies, and combining HPA with VPA for comprehensive scaling, you create workloads that automatically adapt to demand. Start with simple CPU-based scaling and gradually add custom metrics and sophisticated behavior policies as your needs evolve. Regular monitoring and tuning of HPA parameters ensure optimal performance and cost efficiency.


