Cilium CNI Installation and Configuration

Cilium is a high-performance Kubernetes CNI plugin powered by eBPF that provides advanced networking, security policies, and observability without the overhead of traditional iptables-based networking. With Hubble for deep visibility and built-in service mesh capabilities, Cilium is the CNI of choice for production Kubernetes clusters requiring fine-grained network policies and observability.

Prerequisites

  • Kubernetes 1.26+ cluster (without a CNI installed)
  • Linux kernel 5.4+ (5.10+ recommended for full eBPF feature set)
  • cilium CLI and Helm 3.x installed
  • Root or sudo access on nodes
  • kube-proxy can be replaced by Cilium (optional but recommended)

Installing Cilium

# Install Cilium CLI
CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt)
CLI_ARCH=amd64
curl -L --fail --remote-name-all \
  https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-${CLI_ARCH}.tar.gz
sudo tar xzvfC cilium-linux-${CLI_ARCH}.tar.gz /usr/local/bin
rm cilium-linux-${CLI_ARCH}.tar.gz

# Install via Helm (recommended for production)
helm repo add cilium https://helm.cilium.io/
helm repo update

# Basic installation replacing kube-proxy
helm install cilium cilium/cilium \
  --namespace kube-system \
  --set kubeProxyReplacement=true \
  --set k8sServiceHost=192.168.1.10 \
  --set k8sServicePort=6443 \
  --set hubble.relay.enabled=true \
  --set hubble.ui.enabled=true \
  --set prometheus.enabled=true \
  --set operator.prometheus.enabled=true

# Verify installation
cilium status --wait

# Run connectivity test
cilium connectivity test

For existing clusters migrating from another CNI:

# Install Cilium in migration mode
helm install cilium cilium/cilium \
  --namespace kube-system \
  --set cni.chainingMode=generic-veth \
  --set enableIPv4Masquerade=false \
  --set tunnel=disabled

# After migration, remove old CNI and reinstall Cilium in native mode

Network Policy Configuration

Cilium supports standard Kubernetes NetworkPolicy and extends it with CiliumNetworkPolicy:

# Standard Kubernetes NetworkPolicy - deny all ingress to a namespace
cat > deny-all-ingress.yaml <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all-ingress
  namespace: production
spec:
  podSelector: {}
  policyTypes:
    - Ingress
EOF

kubectl apply -f deny-all-ingress.yaml

# Allow specific traffic with CiliumNetworkPolicy (L7 aware)
cat > api-policy.yaml <<EOF
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: api-access-policy
  namespace: production
spec:
  endpointSelector:
    matchLabels:
      app: api-server
  ingress:
    - fromEndpoints:
        - matchLabels:
            app: frontend
      toPorts:
        - ports:
            - port: "8080"
              protocol: TCP
          rules:
            http:
              - method: GET
                path: /api/v1/.*
              - method: POST
                path: /api/v1/users
  egress:
    - toEndpoints:
        - matchLabels:
            app: postgres
      toPorts:
        - ports:
            - port: "5432"
              protocol: TCP
    - toFQDNs:
        - matchPattern: "*.amazonaws.com"
      toPorts:
        - ports:
            - port: "443"
              protocol: TCP
EOF

kubectl apply -f api-policy.yaml

DNS-based egress policies:

# Allow DNS-based external access
cat > dns-policy.yaml <<EOF
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: allow-external-dns
  namespace: production
spec:
  endpointSelector:
    matchLabels:
      app: backend
  egress:
    - toEndpoints:
        - matchLabels:
            k8s:io.kubernetes.pod.namespace: kube-system
            k8s:k8s-app: kube-dns
      toPorts:
        - ports:
            - port: "53"
              protocol: UDP
          rules:
            dns:
              - matchPattern: "*"
    - toFQDNs:
        - matchName: api.github.com
        - matchName: registry.npmjs.org
      toPorts:
        - ports:
            - port: "443"
              protocol: TCP
EOF

kubectl apply -f dns-policy.yaml

Hubble Observability

Hubble provides deep network visibility powered by eBPF:

# Enable Hubble (if not enabled at install time)
helm upgrade cilium cilium/cilium \
  --namespace kube-system \
  --reuse-values \
  --set hubble.relay.enabled=true \
  --set hubble.ui.enabled=true \
  --set hubble.metrics.enabled="{dns,drop,tcp,flow,icmp,http}"

# Install Hubble CLI
HUBBLE_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/hubble/master/stable.txt)
curl -L --fail --remote-name-all \
  https://github.com/cilium/hubble/releases/download/$HUBBLE_VERSION/hubble-linux-amd64.tar.gz
sudo tar xzvfC hubble-linux-amd64.tar.gz /usr/local/bin

# Port-forward Hubble relay for local access
cilium hubble port-forward &

# Observe live traffic flows
hubble observe --namespace production

# Filter by pod
hubble observe --namespace production --pod api-server-xxx

# Show dropped packets (policy violations)
hubble observe --verdict DROPPED

# Show HTTP flows
hubble observe --protocol http --namespace production

# Access Hubble UI
kubectl port-forward -n kube-system svc/hubble-ui 12000:80
# Open http://localhost:12000

Service Mesh Configuration

Cilium's service mesh provides mTLS and traffic management without sidecars:

# Enable mutual TLS between services (sidecar-free)
helm upgrade cilium cilium/cilium \
  --namespace kube-system \
  --reuse-values \
  --set encryption.enabled=true \
  --set encryption.type=wireguard

# Configure traffic load balancing with CiliumEnvoyConfig
cat > load-balancing.yaml <<EOF
apiVersion: cilium.io/v2
kind: CiliumEnvoyConfig
metadata:
  name: api-load-balancer
  namespace: production
spec:
  services:
    - name: api-service
      namespace: production
  resources:
    - "@type": type.googleapis.com/envoy.config.listener.v3.Listener
      name: api-listener
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                route_config:
                  virtual_hosts:
                    - name: api-vhost
                      domains: ["*"]
                      routes:
                        - match:
                            prefix: /
                          route:
                            weighted_clusters:
                              clusters:
                                - name: production/api-v1
                                  weight: 80
                                - name: production/api-v2
                                  weight: 20
EOF

kubectl apply -f load-balancing.yaml

Bandwidth Management

Use eBPF-based bandwidth manager for rate limiting:

# Enable bandwidth manager
helm upgrade cilium cilium/cilium \
  --namespace kube-system \
  --reuse-values \
  --set bandwidthManager.enabled=true \
  --set bandwidthManager.bbr=true

# Apply bandwidth limits via pod annotations
cat > bandwidth-pod.yaml <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: bandwidth-limited-app
  annotations:
    kubernetes.io/egress-bandwidth: "50M"
    kubernetes.io/ingress-bandwidth: "50M"
spec:
  containers:
    - name: app
      image: nginx
EOF

kubectl apply -f bandwidth-pod.yaml

# Verify bandwidth settings
cilium endpoint list

Cluster Mesh

Connect multiple Kubernetes clusters with Cilium Cluster Mesh:

# Enable Cluster Mesh on cluster 1
helm upgrade cilium cilium/cilium \
  --namespace kube-system \
  --reuse-values \
  --set clustermesh.useAPIServer=true \
  --set clustermesh.apiserver.service.type=LoadBalancer \
  --set cluster.id=1 \
  --set cluster.name=cluster-1

# Enable on cluster 2
# (run against second cluster's kubeconfig)
helm upgrade cilium cilium/cilium \
  --namespace kube-system \
  --reuse-values \
  --set clustermesh.useAPIServer=true \
  --set clustermesh.apiserver.service.type=LoadBalancer \
  --set cluster.id=2 \
  --set cluster.name=cluster-2

# Connect clusters
cilium clustermesh connect \
  --context cluster-1 \
  --destination-context cluster-2

# Verify mesh status
cilium clustermesh status

# Enable global service (spans clusters)
cat > global-service.yaml <<EOF
apiVersion: v1
kind: Service
metadata:
  name: shared-database
  namespace: production
  annotations:
    service.cilium.io/global: "true"
    service.cilium.io/affinity: "local"  # prefer local cluster
spec:
  type: ClusterIP
  ports:
    - port: 5432
  selector:
    app: database
EOF

kubectl apply -f global-service.yaml

Troubleshooting

Pods cannot communicate:

# Check Cilium agent status on the node
kubectl -n kube-system exec ds/cilium -- cilium status

# View policy verdicts for a specific pod
POD_ID=$(kubectl -n kube-system exec ds/cilium -- cilium endpoint list | grep your-pod-name | awk '{print $1}')
kubectl -n kube-system exec ds/cilium -- cilium endpoint get $POD_ID

# Check for policy drops in Hubble
hubble observe --verdict DROPPED --namespace production

Cilium agent crashlooping:

# Check kernel version compatibility
uname -r  # Should be 5.4+

# View agent logs
kubectl -n kube-system logs ds/cilium --previous

# Validate BPF filesystem mount
kubectl -n kube-system exec ds/cilium -- mount | grep bpf

DNS resolution failing:

# Check Cilium is not blocking DNS
kubectl -n kube-system exec ds/cilium -- cilium policy get

# Test DNS policy
hubble observe --protocol dns

# Ensure kube-dns pods are running
kubectl -n kube-system get pods -l k8s-app=kube-dns

Hubble relay not working:

# Restart Hubble relay
kubectl -n kube-system rollout restart deploy/hubble-relay

# Verify Hubble peer connectivity
kubectl -n kube-system exec deploy/hubble-relay -- hubble-relay status

Conclusion

Cilium with eBPF delivers superior Kubernetes networking performance, rich L7 network policies, and deep observability through Hubble without the overhead of sidecar-based service meshes. Its cluster mesh feature enables seamless multi-cluster connectivity, while the bandwidth manager provides fine-grained traffic control. Start with the basic installation and progressively enable Hubble, service mesh, and cluster mesh features as your needs evolve.