Kubernetes Horizontal Pod Autoscaler
The Horizontal Pod Autoscaler (HPA) automatically scales the number of pod replicas based on observed metrics like CPU usage and memory consumption. Esta guía cubre installing metrics-server, configuring CPU and memory-based scaling, using custom metrics, implementing sophisticated scaling behaviors, and comparing with Vertical Pod Autoscaler (VPA).
Tabla de contenidos
- HPA Fundamentals
- Metrics Server Instalaation
- CPU and Memory Scaling
- Custom Metrics
- Scaling Behavior
- VPA Comparison
- Practical Examples
- Troubleshooting
- Conclusion
HPA Fundamentals
How HPA Works
HPA continuously monitors metrics and adjusts replica count:
- Metrics Server collects resource metrics from Kubelet
- HPA controller queries metrics
- Calculates desired replica count based on metric targets
- Adjusts implementación/statefulset replica count
- Waits before scaling down to prevent oscillation
Escalado Formula
desiredReplicas = ceil[currentReplicas * (currentMetricValue / desiredMetricValue)]
Requisitos previos
- Kubernetes clúster v1.16+ (v1.18+ recommended)
- Resource requests defined for workloads
- Metrics Server installed
- Sufficient clúster resources for scaling
Metrics Server Instalaation
Instalaing Metrics Server
Verifica si metrics-server is already installed:
kubectl get deployment -n kube-system metrics-server
Instala Metrics Server:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
For kube-adm or self-hosted clústers with self-signed certificates:
kubectl patch deployment metrics-server -n kube-system --type="json" -p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-insecure-tls"}]'
Wait for metrics-server to be ready:
kubectl wait --for=condition=ready pod -l k8s-app=metrics-server -n kube-system --timeout=300s
Verifica metrics are available:
kubectl top nodes
kubectl top pods -A
Metrics Server Configuration
Custom values for different environments:
apiVersion: apps/v1
kind: Deployment
metadata:
name: metrics-server
namespace: kube-system
spec:
template:
spec:
containers:
- name: metrics-server
image: registry.k8s.io/metrics-server/metrics-server:v0.6.4
args:
- --cert-dir=/tmp
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-insecure-tls=true
- --metric-resolution=15s
- --v=2
resources:
requests:
cpu: 100m
memory: 200Mi
limits:
cpu: 200m
memory: 500Mi
CPU and Memory Scaling
Basic CPU-Based HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: simple-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Memory-Based Scaling
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: memory-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: cache-server
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
Combined CPU and Memory Scaling
Scale when EITHER metric is exceeded:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: combined-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: app-server
minReplicas: 3
maxReplicas: 30
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 75
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 85
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 100
periodSeconds: 30
- type: Pods
value: 2
periodSeconds: 60
selectPolicy: Max
Absolute Value Scaling
Scale based on absolute resource amounts:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: absolute-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: worker
minReplicas: 1
maxReplicas: 50
metrics:
- type: Resource
resource:
name: cpu
target:
type: AverageValue
averageValue: "500m"
- type: Resource
resource:
name: memory
target:
type: AverageValue
averageValue: "512Mi"
Custom Metrics
Prometheus Custom Metrics
Instala Prometheus and custom metrics adapter:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install kube-prometheus prometheus-community/kube-prometheus-stack \
-n monitoring \
--create-namespace
Instala custom metrics API:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus-adapter prometheus-community/prometheus-adapter \
-n monitoring \
-f adapter-values.yaml
Adapter configuration:
# adapter-values.yaml
prometheus:
url: http://prometheus-operated:9090
rules:
custom:
- seriesQuery: 'http_requests_per_second'
resources:
template: <<.Resource>>
name:
matches: "^(.*)_per_second"
as: "${1}_rate"
metricsQuery: 'rate(<<.Series>>{<<.LabelMatchers>>}[1m])'
- seriesQuery: 'custom_app_metric'
resources:
template: <<.Resource>>
name:
matches: "^custom_(.+)_metric"
as: "custom_${1}"
metricsQuery: '<<.Series>>{<<.LabelMatchers>>}'
HPA with Custom Metrics
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: custom-metric-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 2
maxReplicas: 20
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "1k"
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 80
Application Metrics Example
Custom metric from application:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: queue-depth-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: message-processor
minReplicas: 1
maxReplicas: 50
metrics:
- type: Pods
pods:
metric:
name: queue_depth
selector:
matchLabels:
metric_type: queue
target:
type: AverageValue
averageValue: "30"
behavior:
scaleUp:
stabilizationWindowSeconds: 30
policies:
- type: Percent
value: 200
periodSeconds: 15
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
Escalado Behavior
Scale-Up Behavior
Control how quickly HPA scales up:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: aggressive-scale-up
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api
minReplicas: 2
maxReplicas: 100
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
behavior:
scaleUp:
stabilizationWindowSeconds: 30
policies:
- type: Percent
value: 100
periodSeconds: 30
- type: Pods
value: 5
periodSeconds: 30
selectPolicy: Max
Scale-Down Behavior
Prevent flapping with scale-down stabilization:
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
- type: Pods
value: 2
periodSeconds: 120
selectPolicy: Min
Policy Descriptions
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
# Scale up by 100% (double replicas) every 30 seconds
- type: Percent
value: 100
periodSeconds: 30
# Or add 4 pods every 60 seconds
- type: Pods
value: 4
periodSeconds: 60
# Choose the policy that scales up the fastest
selectPolicy: Max
scaleDown:
stabilizationWindowSeconds: 300
policies:
# Scale down by 25% every 60 seconds
- type: Percent
value: 25
periodSeconds: 60
# Or remove 2 pods every 120 seconds
- type: Pods
value: 2
periodSeconds: 120
# Choose the policy that scales down the slowest
selectPolicy: Min
VPA Comparison
Vertical Pod Autoscaler
VPA adjusts CPU/memory requests and limits rather than replica count.
Instala VPA:
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh
Or using Helm:
helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm install vpa fairwinds-stable/vpa \
-n kube-system \
--create-namespace
VPA vs HPA
| Aspect | HPA | VPA |
|---|---|---|
| Scaling Type | Horizontal (replicas) | Vertical (resources) |
| Use Case | Load balancing | Right-sizing |
| Latency Impact | Pod restart on scale-up | Pod restart on recommendation |
| Best For | Stateless servicios | Stateful servicios |
| Combined Use | Works well together | Works well together |
Using VPA for Right-Sizing
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: app-vpa
namespace: production
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: web-app
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: "app"
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 2
memory: 4Gi
Combining HPA and VPA
Use both for optimal scaling:
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 75
---
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-vpa
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: "app"
minAllowed:
cpu: 250m
memory: 512Mi
maxAllowed:
cpu: 2
memory: 2Gi
Practical Examples
Ejemplo: Web Server HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nginx
minReplicas: 3
maxReplicas: 50
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 85
behavior:
scaleUp:
stabilizationWindowSeconds: 30
policies:
- type: Percent
value: 100
periodSeconds: 30
- type: Pods
value: 5
periodSeconds: 30
selectPolicy: Max
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
selectPolicy: Min
Ejemplo: Batch Job Scaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: batch-worker-hpa
namespace: jobs
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: batch-worker
minReplicas: 1
maxReplicas: 100
metrics:
- type: Pods
pods:
metric:
name: pending_jobs
target:
type: AverageValue
averageValue: "5"
behavior:
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 200
periodSeconds: 15
scaleDown:
stabilizationWindowSeconds: 600
policies:
- type: Pods
value: 1
periodSeconds: 120
Solución de problemas
Check HPA Status
# View HPA status
kubectl get hpa -n production
kubectl describe hpa web-hpa -n production
# Check events
kubectl get events -n production --sort-by='.lastTimestamp'
View Current Metrics
kubectl get hpa -n production -o custom-columns=NAME:.metadata.name,REFERENCE:.spec.scaleTargetRef.kind,TARGETS:.status.currentMetrics,MINPODS:.spec.minReplicas,MAXPODS:.spec.maxReplicas,REPLICAS:.status.currentReplicas
Debugging
# Check if metrics are available
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | jq .
# View HPA decision history
kubectl describe hpa web-hpa -n production
# Check controller logs
kubectl logs -n kube-system deployment/metrics-server
kubectl logs -n kube-system deployment/horizontal-pod-autoscaler
Conclusión
The Horizontal Pod Autoscaler is essential for running efficient, cost-effective Kubernetes implementacións on VPS and baremetal infrastructure. By properly configuring resource requests, implementing appropriate metrics and scaling policies, and combining HPA with VPA for comprehensive scaling, you create workloads that automatically adapt to demand. Start with simple CPU-based scaling and gradually add custom metrics and sophisticated behavior policies as your needs evolve. Regular monitoreo and tuning of HPA parameters ensure optimal rendimiento and cost efficiency.


