Talos Linux for Kubernetes Installation

Talos Linux is an immutable, API-driven operating system designed exclusively for running Kubernetes, providing a minimal and secure OS with no SSH, no shell, and no package manager. By managing everything through a declarative API, Talos ensures consistent cluster state, simplified upgrades, and a dramatically reduced attack surface compared to traditional Kubernetes installations.

Prerequisites

  • Bare metal servers or VMs (minimum 2 CPU, 2 GB RAM per node)
  • Ability to boot from ISO/PXE (for bare metal) or cloud-init (for VMs)
  • A load balancer or VIP for the Kubernetes API server (for multi-master setups)
  • talosctl CLI installed on your workstation
  • Network connectivity between all nodes

Installing talosctl

# Install talosctl on your workstation (Linux)
curl -sL https://talos.dev/install | sh

# macOS
brew install siderolabs/tap/talosctl

# Verify
talosctl version --client

# Download Talos ISO for your platform
# Get the latest version
TALOS_VERSION=$(talosctl version --client --short | grep "Tag" | awk '{print $2}')

# Download metal ISO (for bare metal)
wget https://github.com/siderolabs/talos/releases/download/${TALOS_VERSION}/metal-amd64.iso

# For cloud providers, disk images are available:
# AWS: AMI available in EC2 marketplace
# GCP: Import the raw disk image
# VMware: Use the OVA image

Generating Machine Configurations

Talos uses declarative YAML configurations. Generate them with talosctl:

# Set variables
CLUSTER_NAME="production-cluster"
CONTROL_PLANE_ENDPOINT="https://192.168.1.10:6443"  # Load balancer VIP or first master IP

# Generate cluster secrets and machine configs
talosctl gen config ${CLUSTER_NAME} ${CONTROL_PLANE_ENDPOINT} \
  --output-dir ./talos-config

# This creates:
# controlplane.yaml  - Config for control plane nodes
# worker.yaml        - Config for worker nodes
# talosconfig        - Client configuration for talosctl

ls -la ./talos-config/

Customize the control plane configuration:

# Edit controlplane.yaml to add custom settings
# Key sections to customize:

cat > ./talos-config/controlplane-patch.yaml <<EOF
machine:
  network:
    hostname: cp-01
    interfaces:
      - interface: eth0
        addresses:
          - 192.168.1.11/24
        routes:
          - network: 0.0.0.0/0
            gateway: 192.168.1.1
        dhcp: false
  install:
    disk: /dev/sda
    image: ghcr.io/siderolabs/installer:v1.7.0
    bootloader: true
    wipe: false
  kubelet:
    extraArgs:
      rotate-server-certificates: true
  sysctls:
    net.ipv4.ip_forward: "1"
    net.bridge.bridge-nf-call-iptables: "1"

cluster:
  network:
    cni:
      name: flannel  # or none to use Cilium/Calico
    podSubnets:
      - 10.244.0.0/16
    serviceSubnets:
      - 10.96.0.0/12
  apiServer:
    admissionControl:
      - name: PodSecurity
        configuration:
          apiVersion: pod-security.admission.config.k8s.io/v1alpha1
          kind: PodSecurityConfiguration
          defaults:
            enforce: baseline
            enforce-version: latest
EOF

# Merge patch into config
talosctl machineconfig patch ./talos-config/controlplane.yaml \
  --patch @./talos-config/controlplane-patch.yaml \
  --output ./talos-config/controlplane-node1.yaml

Bootstrapping the Cluster

Apply configuration to nodes after booting from Talos ISO:

# Set up talosconfig
export TALOSCONFIG=./talos-config/talosconfig

# Nodes boot into "maintenance mode" waiting for configuration
# Apply config to first control plane node
talosctl apply-config \
  --nodes 192.168.1.11 \
  --file ./talos-config/controlplane-node1.yaml \
  --insecure  # Only needed before certs are set up

# Apply config to additional control plane nodes
talosctl apply-config \
  --nodes 192.168.1.12 \
  --file ./talos-config/controlplane-node2.yaml \
  --insecure

talosctl apply-config \
  --nodes 192.168.1.13 \
  --file ./talos-config/controlplane-node3.yaml \
  --insecure

# Bootstrap etcd on the first control plane node (run ONCE)
talosctl bootstrap --nodes 192.168.1.11

# Wait for the API server to come up (takes 1-2 minutes)
talosctl health --nodes 192.168.1.11

# Retrieve kubeconfig
talosctl kubeconfig --nodes 192.168.1.11 ./kubeconfig
export KUBECONFIG=./kubeconfig

# Verify cluster is running
kubectl get nodes

Apply configuration to worker nodes:

# Workers also boot from the same Talos ISO
talosctl apply-config \
  --nodes 192.168.1.21 \
  --file ./talos-config/worker.yaml \
  --insecure

talosctl apply-config \
  --nodes 192.168.1.22 \
  --file ./talos-config/worker.yaml \
  --insecure

# Verify all nodes joined
kubectl get nodes -o wide

API-Driven Management

All Talos management happens through talosctl since there is no SSH:

# View node information
talosctl get members --nodes 192.168.1.11

# Check service status on a node
talosctl services --nodes 192.168.1.11

# View system logs
talosctl logs --nodes 192.168.1.11 machined
talosctl logs --nodes 192.168.1.11 kubelet

# Kernel messages
talosctl dmesg --nodes 192.168.1.11

# Read files from the node
talosctl read --nodes 192.168.1.11 /etc/os-release

# Run a command in a container on the node
talosctl exec --nodes 192.168.1.11 -- ls /

# Get disk usage
talosctl df --nodes 192.168.1.11

# Network interfaces
talosctl get addresses --nodes 192.168.1.11
talosctl get routes --nodes 192.168.1.11

# Apply configuration changes (non-disruptive where possible)
talosctl apply-config \
  --nodes 192.168.1.11 \
  --file ./talos-config/controlplane-updated.yaml

Storage Integration

Talos supports multiple storage solutions. Using Longhorn as an example:

# Longhorn requires specific system extensions for iSCSI
# Add the iscsi-tools extension to worker node config

cat > worker-storage-patch.yaml <<EOF
machine:
  install:
    extensions:
      - image: ghcr.io/siderolabs/iscsi-tools:v0.1.4
  kubelet:
    extraMounts:
      - destination: /var/lib/longhorn
        type: bind
        source: /var/lib/longhorn
        options:
          - bind
          - rshared
          - rw
EOF

talosctl machineconfig patch ./talos-config/worker.yaml \
  --patch @worker-storage-patch.yaml \
  --output ./talos-config/worker-storage.yaml

# Apply updated config (node will reboot)
talosctl apply-config \
  --nodes 192.168.1.21 \
  --file ./talos-config/worker-storage.yaml

# For local storage, configure disk encryption
cat > local-storage-patch.yaml <<EOF
machine:
  disks:
    - device: /dev/sdb
      partitions:
        - mountpoint: /var/mnt/data
          size: 0
EOF

Upgrading Talos and Kubernetes

Talos upgrades are rolling and API-driven:

# Check current version
talosctl version --nodes 192.168.1.11

# Upgrade Talos on a specific node
talosctl upgrade \
  --nodes 192.168.1.11 \
  --image ghcr.io/siderolabs/installer:v1.8.0

# The node will reboot with the new version
# Upgrade workers one by one
for node in 192.168.1.21 192.168.1.22 192.168.1.23; do
  echo "Upgrading ${node}..."
  talosctl upgrade --nodes ${node} --image ghcr.io/siderolabs/installer:v1.8.0
  # Wait for node to come back
  sleep 60
  kubectl wait node --timeout=300s --for=condition=Ready -l kubernetes.io/hostname=$(kubectl get node -o jsonpath='{.items[?(@.status.addresses[0].address=="'${node}'")].metadata.name}')
done

# Upgrade Kubernetes version
talosctl upgrade-k8s \
  --nodes 192.168.1.11 \
  --to 1.31.0

Security Model

Talos enforces a strict security model by design:

# Talos runs with:
# - No SSH daemon
# - No interactive shell
# - No package manager
# - All processes run in containers
# - Read-only root filesystem
# - Signed OS components

# Verify node security settings
talosctl get securitystate --nodes 192.168.1.11

# Machine configuration is encrypted at rest
# Access requires the talosconfig client certificate

# Enable disk encryption (add to machine config)
cat >> machine-encryption-patch.yaml <<EOF
machine:
  systemDiskEncryption:
    ephemeral:
      provider: luks2
      keys:
        - nodeID: {}
          slot: 0
    state:
      provider: luks2
      keys:
        - nodeID: {}
          slot: 0
EOF

# Audit node configuration
talosctl get mc --nodes 192.168.1.11 -o yaml

Troubleshooting

Node stuck in maintenance mode:

# Check the node can reach the network
talosctl get addresses --nodes 192.168.1.11 --insecure

# Re-apply configuration
talosctl apply-config --nodes 192.168.1.11 --file controlplane.yaml --insecure

Bootstrap fails:

# Check etcd status
talosctl service etcd --nodes 192.168.1.11

# View etcd logs
talosctl logs --nodes 192.168.1.11 etcd

# Ensure bootstrap is only called once on the first control plane node

Nodes not joining:

# Verify the cluster endpoint is reachable
curl -k https://192.168.1.10:6443/healthz

# Check worker logs
talosctl logs --nodes 192.168.1.21 kubelet --insecure

# Confirm worker config has correct controlplane endpoint
talosctl get mc --nodes 192.168.1.21 --insecure | grep endpoint

Configuration apply fails:

# Validate config syntax before applying
talosctl validate --config controlplane.yaml --mode metal

# Check for version compatibility
talosctl version --nodes 192.168.1.11

Conclusion

Talos Linux provides a purpose-built, immutable OS for Kubernetes that eliminates entire categories of security risks by removing SSH, shells, and package managers. Its API-driven model enables consistent, auditable management at scale, and upgrades are rolling and non-disruptive. For production Kubernetes on bare metal or VMs, Talos is an excellent choice when security and operational consistency are priorities.