Deploying a Highly Available Kubernetes Cluster on Proxmox with Terraform and Talos OS
A highly available Kubernetes cluster in a homelab setup creates opportunities to test distributed systems, automation, and failure recovery under real-world conditions. This guide walks through one approach to building such a cluster using Proxmox for virtualization, Terraform for provisioning, and Talos OS for running the Kubernetes nodes.
This setup provides declarative infrastructure and immutable operating systems, eliminating the need for traditional Linux administration—no SSH, no shell, and no drifting configuration. The result is a consistent, secure, and maintainable cluster architecture suitable for long-term experimentation or light production use.
All the code used in this guide, including Terraform modules, Talos patches, and Ansible playbooks, is available in the following repository: GitHub - blog-resources. This makes it easier to follow along or adapt the configuration to your own environment.
Why Run Kubernetes on Proxmox
Proxmox provides a powerful abstraction over hardware. It allows centralized management of virtual machines with native support for snapshots, backups, and live migration. These capabilities simplify maintenance workflows—especially for multi-node Kubernetes clusters where availability and recovery matter.
Virtualization also decouples hardware from workload placement. A Kubernetes node can be relocated across physical hosts or replaced entirely without touching internal configuration. In heterogeneous environments or when gradually expanding a homelab, this flexibility reduces friction.
Networking in Proxmox supports complex topologies without needing to rewire hardware or manually configure Linux networking. Virtual machines can attach to multiple NICs, bridges, or tagged VLAN interfaces, making it easier to separate control-plane, storage, and ingress traffic at the infrastructure level.
This setup combines infrastructure abstraction from Proxmox, provisioning logic from Terraform, and cluster integrity from Talos—offering a clean and automated deployment path for a resilient Kubernetes environment.
What Talos OS Brings to Kubernetes
Talos is a Kubernetes-native operating system that emphasizes simplicity, security, and immutability. Unlike traditional Linux distributions, Talos removes the user-facing shell entirely—there is no SSH, no shell, and no package manager. All system interaction happens through a dedicated API using talosctl
, which connects securely to a node’s management endpoint.
This architecture brings several advantages. Talos is minimal by design and loads into memory as an ephemeral OS. Because it treats the OS and Kubernetes as a single unit, there is no need to manage these components independently. The entire lifecycle of a node—from configuration to upgrades—is declarative, API-driven, and reproducible.
Immutability eliminates configuration drift. Nodes do not accept runtime changes outside of the declared configuration files. This aligns well with infrastructure-as-code workflows and prevents untracked state changes. Talos supports a wide range of platforms—including bare metal, Proxmox, public clouds, and nested VM environments—making it versatile for consistent deployments.
Talos also integrates security into its core. With no exposed SSH ports or unnecessary services, its attack surface is smaller than general-purpose operating systems. All configuration is encrypted and authenticated, and system extensions allow adding only explicitly whitelisted functionality.
Operating and upgrading a Talos-based cluster becomes predictable and standardized. Configuration is stored in YAML and applied via the Talos API. Bootstrapping Kubernetes is built into the OS, eliminating the need for kubeadm or external provisioning tools.
There are trade-offs. Without SSH access, node-level debugging requires a different approach. Most troubleshooting must be done via talosctl
, and more advanced inspection may involve kubectl debug
to launch temporary containers. While Talos provides strong guarantees around automation and consistency, its documentation is still evolving and may not always cover edge cases.
Talos fits well for those who value repeatability, minimal drift, and secure automation in Kubernetes operations. It replaces traditional systems management with modern, declarative tooling suited for fully automated infrastructure.
Creating a Proxmox Cluster
Proxmox clusters consist of multiple nodes joined under a shared management layer. Each node runs the Proxmox VE hypervisor and communicates over a trusted control network. The cluster configuration allows scheduling virtual machines across nodes, performing HA failover, and managing resources from a unified UI or API.
Before creating a cluster, each node must be installed with Proxmox VE and connected to a common management network. If the initial setup of Proxmox is still pending, including basic installation and configuration of a single node, see the previous guide: Building a Home Virtualization Server with Proxmox.
Once the base system is installed, one node initializes the cluster through the web interface. From the primary node, navigate to the Datacenter view, open the Cluster tab, and select Create Cluster. After creation, Proxmox provides a join command with embedded tokens and cluster secrets.
Each additional node joins the cluster by opening the same Cluster menu, selecting Join Cluster, and pasting the previously generated join string. All nodes must be in a clean state before joining—no existing VMs, templates, or custom storage pools configured. After the join process completes, the nodes appear under a single Datacenter
view in the Proxmox UI, ready for workload distribution and shared management.
Provisioning Kubernetes Nodes with Terraform
With the Proxmox cluster in place, the next step provisions the virtual machines that will serve as Kubernetes control plane and worker nodes. Terraform handles this workflow using the Telmate Proxmox provider, allowing infrastructure to be described and versioned as code.
Configuring Terraform Access to Proxmox
Proxmox requires an API token for Terraform to authenticate. This token is created under Datacenter → Permissions → API Tokens
for the root@pam
user. Privilege Separation
remains disabled for full access.
Store the credentials in a file named secrets.auto.tfvars
:
|
|
Configure the provider in your main Terraform module:
|
|
Declaring Control Plane Node Configuration
Each Talos control plane node is described using structured variables. The configuration includes Proxmox host assignment, VM IDs, resource allocation, and network interface definitions.
|
|
An example list defines three nodes across different Proxmox hosts:
|
|
Creating Talos VMs
The virtual machines are provisioned with a single proxmox_vm_qemu
resource that loops through the list of control node definitions:
|
|
After defining the resources, the following commands apply the configuration:
|
|
Installing Talos and Bootstrapping Kubernetes with Ansible
Talos replaces the traditional Linux node installation and configuration workflow with an API-driven model. The operating system is deployed by applying declarative configuration files through talosctl
, and the entire lifecycle—including Kubernetes bootstrap, OS upgrades, and optional extensions—is managed without direct access to the host.
Playbook is not idempotent, and should be executed only once for cluster creation.
A set of variables defines the layout of the cluster, including the IPs of control and worker nodes, the cluster name, and where configuration files are rendered.
|
|
Cluster Secrets Generation
Talos clusters rely on a secrets bundle to secure internal communication and encrypt sensitive resources. These secrets are generated once using talosctl gen secrets
. The playbook checks for the presence of secrets.yaml
, and if it doesn’t exist, it creates it:
|
|
The generated file must remain consistent for the lifetime of the cluster.
Generating Talos Configuration with Patches
The Talos cluster is configured using base configuration files created with talosctl gen config
, which produces control plane and worker node definitions.
Two patch files are applied to customize the generated YAML:
common.yaml
contains shared configuration for all nodes. It disables predictable interface naming, enables server certificate rotation, and installs the Kubelet Serving Certificate Approver:
|
|
proxmox-machine.yaml
ensures compatibility with Proxmox by explicitly specifying the disk device:
|
|
The playbook applies these patches in order. First, the base files are created:
|
|
Then they are patched individually for control and worker nodes:
|
|
Applying Configuration to Nodes
Before configuration can be applied, the local talosctl
context must point to the control plane endpoint:
|
|
Configuration is applied to each node using talosctl apply-config
. Talos installs the OS image on the disk and reboots automatically.
|
|
Bootstrapping the Kubernetes Control Plane
The first control plane node must initialize the etcd datastore and bring up the Kubernetes control plane. This operation needs to be performed exactly once in the cluster’s lifecycle.
|
|
After bootstrapping, the control plane becomes active and accessible over the configured endpoint.
Installing System Extensions via Talos Factory
To enable QEMU guest integration, a Talos image must be built with the required system extension. This is defined in the customization.yaml
file:
|
|
The playbook sends this definition to the Talos Factory API and retrieves a custom image ID:
|
|
An image URL is constructed:
|
|
Then the image is applied to each control and worker node using talosctl upgrade
. This operation is blocking and there is no need for artificial wait commands.
|
|
Retrieving Kubernetes Kubeconfig
Once Talos has bootstrapped the cluster and all nodes are healthy, the kubeconfig can be pulled from the control plane:
|
|
Execute the playbook.
|
|
This configuration file grants administrative access to the Kubernetes cluster via kubectl
. Verifying the cluster is straightforward:
|
|
Final Thoughts
This architecture combines the flexibility of virtualized infrastructure with the safety and consistency of an immutable operating system. Each layer—Proxmox, Terraform, and Talos—plays a distinct role in building a repeatable, declarative, and resilient Kubernetes cluster.
Talos enforces a disciplined operational model. It removes many debugging conveniences and replaces them with structured APIs and tools. In return, it offers reproducibility, minimal drift, and deep integration with Kubernetes as a system.
This workflow creates a stable foundation for testing high-availability clusters, experimenting with upgrades, or hosting lightweight production workloads in a homelab.
All code used in this guide is available at:
Happy engineering!