Monitoreo Kubernetes Clústeres with Prometheus

Kubernetes introduces operational complexity requiring sophisticated monitoreo. The kube-prometheus-stack provides pre-configured Prometheus, Grafana, and Alertmanager alongside Kubernetes-specific exporters. Esta guía covers deploying the stack via Helm, configuring ServiceMonitors, creating dashboards, and Configurando comprehensive alerting for Kubernetes clusters.

Tabla de Contenidos

Introducción

Monitoreo Kubernetes Requiere observability into cluster infrastructure, API servers, container runtime, and application workloads. The kube-prometheus-stack bundles pre-configured components eliminating manual setup Mientras providing proven monitoreo configurations.

Architecture

Kubernetes Monitoreo Stack

┌────────────────────────────────────────┐
│     Kubernetes Cluster                 │
│  ┌──────────────────────────────────┐  │
│  │   kubelet (every node)           │  │
│  │   ├─ cAdvisor metrics            │  │
│  │   ├─ Node metrics                │  │
│  │   └─ Pod metrics                 │  │
│  └──────────────────────────────────┘  │
│         ↓                                │
│  ┌──────────────────────────────────┐  │
│  │  kube-prometheus-stack           │  │
│  │  ├─ Prometheus Operator          │  │
│  │  ├─ Prometheus Server            │  │
│  │  ├─ Alertmanager                 │  │
│  │  ├─ Grafana                      │  │
│  │  ├─ Node Exporter                │  │
│  │  └─ kube-state-metrics           │  │
│  └──────────────────────────────────┘  │
└────────────────────────────────────────┘
         ↓
   External Systems
   (Slack, PagerDuty, etc.)

Requisitos del Sistema

  • Kubernetes 1.19+ cluster
  • Helm 3.x installed
  • kubectl configured and authenticated
  • At least 4GB free memory in cluster
  • 20GB persistent storage (for Prometheus)
  • Internet access for image downloads

Kubernetes Configuración

Install kubectl

# Ubuntu/Debian
curl -LO https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl

# Verify
kubectl version --client

Install Helm

curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
helm version

Access Kubernetes Clúster

# Configure kubectl context
kubectl config use-context your-cluster

# Verify cluster access
kubectl get nodes
kubectl get namespaces

Helm Instalación

Agregar Prometheus Helm Repository

# Add Prometheus community repository
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add kube-state-metrics https://kubernetes.github.io/kube-state-metrics
helm repo update

# List available charts
helm search repo prometheus-community | grep kube-prometheus-stack

Crear Monitoreo Namespace

kubectl create namespace monitoring
kubectl label namespace monitoring release=monitoring

Crear Values Archivo

cat > prometheus-values.yaml << 'EOF'
prometheus:
  prometheusSpec:
    retention: 15d
    storageSpec:
      volumeClaimTemplate:
        spec:
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 50Gi
    resources:
      requests:
        cpu: 500m
        memory: 2Gi
      limits:
        cpu: 2000m
        memory: 4Gi

grafana:
  enabled: true
  adminPassword: admin123
  persistence:
    enabled: true
    size: 10Gi

alertmanager:
  enabled: true
  alertmanagerSpec:
    storage:
      volumeClaimTemplate:
        spec:
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 2Gi

prometheus-operator:
  enabled: true

prometheus-node-exporter:
  enabled: true
  hostNetwork: true

kube-state-metrics:
  enabled: true
EOF

Kube-Prometheus-Stack Despliegue

Desplegar Stack

# Install kube-prometheus-stack
helm install kube-prometheus-stack \
  prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --values prometheus-values.yaml

# Verify deployment
kubectl get all -n monitoring
kubectl get pods -n monitoring -w

Verificar Components

# Check Prometheus
kubectl get svc -n monitoring prometheus-kube-prometheus-prometheus

# Check Grafana
kubectl get svc -n monitoring kube-prometheus-stack-grafana

# Check Alertmanager
kubectl get svc -n monitoring kube-prometheus-stack-alertmanager

Access Servicios

# Port forward Prometheus
kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-prometheus 9090:9090

# Port forward Grafana
kubectl port-forward -n monitoring svc/kube-prometheus-stack-grafana 3000:80

# Port forward Alertmanager
kubectl port-forward -n monitoring svc/kube-prometheus-stack-alertmanager 9093:9093

# Access:
# Prometheus: http://localhost:9090
# Grafana: http://localhost:3000
# Alertmanager: http://localhost:9093

ServiceMonitors

Monitor Application Servicios

Crear ServiceMonitor for application exposing metrics:

cat > servicemonitor-example.yaml << 'EOF'
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: app-metrics
  namespace: default
spec:
  selector:
    matchLabels:
      app: my-application
  endpoints:
  - port: metrics
    interval: 30s
    path: /metrics
    scheme: http
EOF

kubectl apply -f servicemonitor-example.yaml

Monitor Prometheus Operator

cat > servicemonitor-prometheus.yaml << 'EOF'
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: prometheus-operator
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: kube-prometheus-operator
  endpoints:
  - port: metrics
    interval: 30s
EOF

kubectl apply -f servicemonitor-prometheus.yaml

PrometheusRule

Crear alerting rules for Kubernetes:

cat > prometheusrule-kubernetes.yaml << 'EOF'
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: kubernetes-rules
  namespace: monitoring
spec:
  groups:
  - name: kubernetes
    interval: 30s
    rules:
    - alert: KubernetesNodeNotReady
      expr: kube_node_status_condition{condition="Ready",status="true"} == 0
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "Kubernetes Node not ready (instance {{ $labels.node }})"

    - alert: KubernetesPodCrashLooping
      expr: rate(kube_pod_container_status_restarts_total[1h]) > 0.1
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Kubernetes Pod crash looping (pod {{ $labels.pod }})"

    - alert: KubernetesMemoryPressure
      expr: kube_node_status_condition{condition="MemoryPressure",status="true"} == 1
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Kubernetes Memory Pressure (node {{ $labels.node }})"

    - alert: KubernetesDiskPressure
      expr: kube_node_status_condition{condition="DiskPressure",status="true"} == 1
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Kubernetes Disk Pressure (node {{ $labels.node }})"
EOF

kubectl apply -f prometheusrule-kubernetes.yaml

Paneles

Pre-installed Paneles

The kube-prometheus-stack includes dashboards:

  • Kubernetes Clúster
  • Kubernetes Nodos
  • Kubernetes Pods
  • Prometheus Descripción General

Crear Custom Panel

cat > custom-dashboard.json << 'EOF'
{
  "dashboard": {
    "title": "Custom Kubernetes Application",
    "panels": [
      {
        "title": "Pod CPU Usage",
        "targets": [
          {
            "expr": "sum(rate(container_cpu_usage_seconds_total[5m])) by (pod_name)"
          }
        ]
      },
      {
        "title": "Pod Memory Usage",
        "targets": [
          {
            "expr": "sum(container_memory_usage_bytes) by (pod_name)"
          }
        ]
      },
      {
        "title": "Pod Network",
        "targets": [
          {
            "expr": "rate(container_network_receive_bytes_total[5m])"
          }
        ]
      }
    ]
  }
}
EOF

Alerting Rules

Configurar Alertmanager

cat > alertmanager-config.yaml << 'EOF'
apiVersion: v1
kind: ConfigMap
metadata:
  name: alertmanager-config
  namespace: monitoring
data:
  alertmanager.yml: |
    global:
      resolve_timeout: 5m

    route:
      receiver: 'default'
      group_by: ['alertname', 'cluster']
      group_wait: 10s
      group_interval: 10s
      repeat_interval: 12h
      
      routes:
        - match:
            severity: critical
          receiver: 'critical-team'
          group_wait: 0s

        - match:
            severity: warning
          receiver: 'slack'

    receivers:
      - name: 'default'

      - name: 'critical-team'
        email_configs:
          - to: '[email protected]'
            from: '[email protected]'
            smarthost: 'smtp.example.com:587'
            auth_username: '[email protected]'
            auth_password: 'password'

      - name: 'slack'
        slack_configs:
          - api_url: 'https://hooks.slack.com/services/YOUR/WEBHOOK'
            channel: '#alerts'
EOF

kubectl apply -f alertmanager-config.yaml

Escalado and Rendimiento

Alta Disponibilidad Configuración

cat > kube-prometheus-ha-values.yaml << 'EOF'
prometheus:
  prometheusSpec:
    replicas: 2
    retention: 30d
    storageSpec:
      volumeClaimTemplate:
        spec:
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 100Gi
    externalLabels:
      cluster: "production"
      region: "us-east-1"

alertmanager:
  alertmanagerSpec:
    replicas: 2
    retention: 120h

grafana:
  replicas: 2
  persistence:
    size: 20Gi
EOF

helm upgrade kube-prometheus-stack \
  prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --values kube-prometheus-ha-values.yaml

Resource Management

# Check resource usage
kubectl top nodes -n monitoring
kubectl top pods -n monitoring

# Update resource requests/limits
kubectl set resources deployment \
  -n monitoring \
  prometheus-kube-prometheus-operator \
  --requests=cpu=500m,memory=512Mi \
  --limits=cpu=2000m,memory=2Gi

Solución de Problemas

Verificar Component Estado

# Check all pods running
kubectl get pods -n monitoring

# View pod logs
kubectl logs -n monitoring -l app=prometheus

# Check ServiceMonitor discovery
kubectl get servicemonitor -n monitoring
kubectl describe servicemonitor -n monitoring app-metrics

# Verify metrics scraping
kubectl exec -n monitoring prometheus-pod -- \
  promtool query instant 'up'

Debug Métricas Collection

# Access Prometheus console
kubectl port-forward -n monitoring svc/prometheus 9090:9090

# Query metrics
curl http://localhost:9090/api/v1/query?query=kubernetes_build_info

# Check targets
curl http://localhost:9090/api/v1/targets

Common Issues

# ServiceMonitor not picked up
# Check label selectors match
kubectl get servicemonitor -n monitoring -o yaml

# Prometheus not scraping targets
# Verify ServiceMonitor exists in same namespace
# Check selector labels on Service

# Storage issues
# Check PVC status
kubectl get pvc -n monitoring
kubectl describe pvc prometheus-kube-prometheus-prometheus-db-prometheus-0 -n monitoring

Conclusión

The kube-prometheus-stack provides enterprise-grade Kubernetes monitoreo out of the box. By following Esta guía, you've deployed a comprehensive monitoreo platform for your Kubernetes infrastructure. Focus on creating meaningful ServiceMonitors for your applications, setting appropriate alert thresholds based on SLOs, and continuously refining dashboards. Kubernetes observability es crítico for reliable, scalable deployments.