Environment Setup & Prerequisites
Network Planning
Management VM Deployment
Tool Requirements
Creating VM Templates in Proxmox
Ubuntu Template for Management VM
Talos Template
Building the Terraform Infrastructure Modules
Talos Node Module
Ubuntu Cloud-Init Module
Deploying Your Infrastructure
Setting the Terraform Project Structure
Complete Tutorial Environment Configuration
Deploy Your Complete Infrastructure
Manual Cluster Bootstrap
Conclusion
Welcome to the next part of our Talos Linux on Proxmox series! In the previous article, we explored the theoretical foundations. Now we’ll build the automation infrastructure that makes our Talos cluster deployment reliable and repeatable.
This part focuses on the foundation: creating robust automation that provisions infrastructure and bootstraps a fully functional Kubernetes cluster. By the end, you’ll have a working 5-node Talos cluster with high-availability control plane and modern networking.
In the following article, we’ll add the platform components (storage, ingress, TLS) that make it production-ready.
Before we start building our automation, let’s set up the foundation. Our approach uses a dedicated management VM that serves as the command center for all cluster operations.
Network planning is one of the most environment-specific aspects of this deployment and will vary significantly based on your infrastructure setup. In our case, we have control over a Proxmox server with access to network configuration, DHCP reservations, and the ability to allocate specific IP ranges for our Kubernetes cluster. Your setup might be different as you may be working with existing DHCP servers, managed switches, or cloud environments with different networking constraints.
Taking that into consideration, plan your network architecture carefully before anything else. You’ll need to allocate IP addresses for several components within your existing network range, as a reference these are the components we needed to consider for our cluster:
# Network Architecture (Example: 192.168.1.0/24)
VIP (kube-vip): <your-vip-ip> # Example: 192.168.1.10
Control Planes: <range-for-cp-nodes> # Example: 192.168.1.24-26
Workers: <range-for-workers> # Example: 192.168.1.72-73
Management VM: <mgmt-vm-ip> # Example: 192.168.1.200
MetalLB Pool: <loadbalancer-range> # Example: 192.168.1.120-130
Gateway: <your-gateway-ip> # Example: 192.168.1.254
Key Planning Considerations:
ens18 in Proxmox VMs, but may vary). You can check this by running ip addr inside a VMThis allocation strategy prevents IP conflicts and provides clear service boundaries for different cluster components.
The management VM is crucial to our automation strategy. It runs Ubuntu and hosts all the tools needed for cluster operations. While we recommend using a dedicated management VM for operational consistency and network proximity to the cluster, this approach is not strictly necessary. You could alternatively install all tools (kubectl, talosctl, Helm) on your local machine or any server that has network access to your cluster. The management VM approach provides benefits like consistent tooling, reduced network latency, and a dedicated operational environment. For the sake of this article, we will assume a management vm was created and all steps will be done through it. Here’s how we automate its creation using cloud-init:
Important Configuration Notes:
/var/lib/vz/snippets/ on your Proxmox server so it can be referenced during VM deployment# configs/cloud-init/management-vm-cloud-init.yml
#cloud-config
users:
- name: <your-username> # Replace with your desired username
sudo: ALL=(ALL) NOPASSWD:ALL
shell: /bin/bash
ssh_authorized_keys:
- <your-ssh-public-key> # Replace with your SSH public key
packages:
- curl
- wget
- git
- python3-pip
- software-properties-common
- apt-transport-https
- ca-certificates
- gnupg
- lsb-release
runcmd:
# Install Docker
- curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
- echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | tee /etc/apt/sources.list.d/docker.list > /dev/null
- apt-get update
- apt-get install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin
- usermod -aG docker <your-username>
# Install kubectl
- curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
- install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
# Install talosctl
- curl -sL https://talos.dev/install | sh
# Install Helm
- curl https://baltocdn.com/helm/signing.asc | gpg --dearmor | tee /usr/share/keyrings/helm.gpg > /dev/null
- echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/helm.gpg] https://baltocdn.com/helm/stable/debian/ all main" | tee /etc/apt/sources.list.d/helm-stable-debian.list
- apt-get update
- apt-get install -y helm
package_update: true
package_upgrade: true
This cloud-init configuration creates a fully equipped management environment with all necessary tools pre-installed.
Our automation stack requires several tools working together. These are split between your local machine (where you’ll run the deployment) and the management VM (which will be deployed in your Proxmox environment):
Required on Your Local Machine:
Automatically Installed on Management VM (via cloud-init):
The management VM comes pre-configured with all Kubernetes-related tools through our cloud-init script, eliminating manual setup complexity. You’ll SSH into this VM to perform cluster operations after the initial infrastructure deployment.
Before we can use Terraform to deploy VMs, we need to create reusable templates in Proxmox. Templates allow us to rapidly deploy consistent VM configurations and are essential for our automation approach.
Download Ubuntu Cloud Image:
# SSH to your Proxmox host and download the Ubuntu cloud image
wget https://cloud-images.ubuntu.com/releases/22.04/release/ubuntu-22.04-server-cloudimg-amd64.img
Create VM and Convert to Template:
# Create a new VM (ID 9000 in this example)
qm create 9000 --name ubuntu-22.04-template --memory 2048 --cores 2 --net0 virtio,bridge=vmbr0
# Import the disk image
qm importdisk 9000 ubuntu-22.04-server-cloudimg-amd64.img local-lvm
# Attach the disk to the VM
qm set 9000 --scsihw virtio-scsi-pci --scsi0 local-lvm:vm-9000-disk-0
# Add cloud-init drive
qm set 9000 --ide2 local-lvm:cloudinit
# Configure boot order
qm set 9000 --boot c --bootdisk scsi0
# Add serial console
qm set 9000 --serial0 socket --vga serial0
# Convert to template
qm template 9000
For the Talos template, we’ll use the generic Talos ISO rather than our custom factory image. While we could create a template with the custom image (which includes Longhorn extensions), keeping the template generic provides more flexibility. We’ll apply the custom image during the actual cluster configuration phase, allowing us to easily switch between different custom images or Talos versions without recreating templates.
Download Talos ISO:
# Download the latest Talos ISO to Proxmox local storage
cd /var/lib/vz/template/iso
wget https://github.com/siderolabs/talos/releases/download/v1.7.6/talos-amd64.iso
Create Talos Template:
# Create VM for Talos template (ID 9001 in this example)
qm create 9001 --name talos-template --memory 4096 --cores 2 --net0 virtio,bridge=vmbr0
# Add disk for Talos installation
qm set 9001 --scsi0 local-lvm:20
# Set SCSI controller
qm set 9001 --scsihw virtio-scsi-pci
# Configure boot and other settings
qm set 9001 --boot c --bootdisk scsi0
qm set 9001 --ostype l26
qm set 9001 --agent 0
# Convert to template
qm template 9001
Key Template Considerations:
These templates serve as the foundation for our Terraform automation, allowing rapid deployment of consistent VM configurations.
Our infrastructure automation is built around modular Terraform code that can provision both Talos nodes and management VMs. This modular approach allows us to reuse components across different environments (dev, staging, production) while maintaining consistency.
The Talos node module is the core building block for our Kubernetes cluster. It creates VMs optimized for running Talos Linux with specific configurations for both control plane and worker nodes.
# terraform/modules/talos-node/main.tf
resource "proxmox_vm_qemu" "talos_node" {
name = var.node_prefix
target_node = var.target_node
clone = var.template_name
full_clone = true
boot = "c"
bootdisk = "scsi0"
# Storage Configuration
disk {
slot = "scsi0"
type = "disk"
storage = "local-lvm"
size = "20G"
discard = true
}
# Talos Installation ISO
disk {
slot = "ide2"
type = "cdrom"
iso = "local:iso/talos-amd64.iso"
}
# VM Configuration
os_type = "cloud-init"
ciuser = "talos"
memory = var.memory
scsihw = "virtio-scsi-pci"
agent = 0
onboot = true
# CPU Configuration
cpu {
type = "host"
cores = var.cores
sockets = var.sockets
}
# Network Configuration
network {
id = 0
model = "virtio"
bridge = var.network_bridge
}
}
Storage Strategy: We allocate 20GB for each node, which provides sufficient space for the Talos OS, container images, and temporary storage. The discard option enables TRIM support for better SSD performance.
CPU Configuration: Using type = "host" passes through the host CPU features to the VM, providing optimal performance. This is particularly important for Talos as it leverages hardware-specific optimizations.
Network Setup: The virtio network model offers the best performance for virtualized environments. The network bridge (vmbr0) connects VMs to the physical network.
Talos ISO Attachment: The ISO is attached to facilitate the initial Talos installation process. This gets replaced by the actual Talos configuration during deployment.
terraform/modules/talos-node/variables.tf:
variable "node_prefix" {
type = string
description = "Prefix for the VM name"
}
variable "target_node" {
type = string
description = "Proxmox node where the VM will be created"
}
variable "template_name" {
type = string
description = "Name of the Talos template to clone from"
}
variable "memory" {
type = number
description = "Amount of memory in MB"
default = 4096
}
variable "cores" {
type = number
description = "Number of CPU cores"
default = 2
}
variable "sockets" {
type = number
description = "Number of CPU sockets"
default = 1
}
variable "network_bridge" {
type = string
description = "Network bridge to connect to"
default = "vmbr0"
}
variable "disk_size" {
type = string
description = "Size of the boot disk"
default = "20G"
}
variable "disk_storage" {
type = string
description = "Storage location for disks"
default = "local-lvm"
}
terraform/modules/talos-node/outputs.tf:
output "vm_id" {
description = "ID of the created VM"
value = proxmox_vm_qemu.talos_node.vmid
}
output "vm_name" {
description = "Name of the created VM"
value = proxmox_vm_qemu.talos_node.name
}
output "vm_ipv4_address" {
description = "IPv4 address of the VM (DHCP assigned)"
value = proxmox_vm_qemu.talos_node.default_ipv4_address
}
output "vm_network_interfaces" {
description = "Network interface configuration"
value = proxmox_vm_qemu.talos_node.network
}
The Ubuntu cloud-init module creates our management VM, which serves as the operational command center for the entire cluster.
# terraform/modules/ubuntu-cloud-init-vm/main.tf
resource "proxmox_vm_qemu" "management_vm" {
name = var.vm_name
target_node = var.target_node
clone = var.template_name
full_clone = true
# VM Resources
memory = var.memory
scsihw = "virtio-scsi-pci"
agent = 1
onboot = true
boot = "c"
bootdisk = "scsi0"
cpu {
type = "host"
cores = var.cores
sockets = var.sockets
}
# Storage Configuration
disk {
slot = "scsi0"
type = "disk"
storage = "local-lvm"
size = var.disk_size
discard = true
}
# Cloud-init drive
disk {
slot = "ide2"
type = "cloudinit"
storage = "local-lvm"
}
# Network configuration with static IP
network {
id = 0
model = "virtio"
bridge = var.network_bridge
}
# Cloud-init configuration
os_type = "cloud-init"
ipconfig0 = "ip=${var.ip_address}/24,gw=${var.gateway_ip}"
# Reference to our cloud-init snippet
cicustom = "user=local:snippets/management-vm-cloud-init.yml"
# Serial console for debugging
serial {
id = 0
type = "socket"
}
vga {
type = "serial0"
}
}
Static IP Assignment: Unlike Talos nodes that start with DHCP, the management VM gets a static IP immediately through cloud-init configuration.
Cloud-Init Integration: The cicustom parameter references our cloud-init snippet stored in Proxmox’s local storage at /var/lib/vz/snippets/. This enables automated tool installation and user setup.
QEMU Guest Agent: Enabled (agent = 1) for better VM management and monitoring capabilities from Proxmox.
Serial Console: Configured for troubleshooting scenarios where network access might not be available.
terraform/modules/ubuntu-cloud-init-vm/variables.tf:
variable "vm_name" {
type = string
description = "Name of the VM"
}
variable "target_node" {
type = string
description = "Proxmox node where the VM will be created"
}
variable "template_name" {
type = string
description = "Name of the Ubuntu template to clone from"
}
variable "memory" {
type = number
description = "Amount of memory in MB"
default = 4096
}
variable "cores" {
type = number
description = "Number of CPU cores"
default = 2
}
variable "sockets" {
type = number
description = "Number of CPU sockets"
default = 1
}
variable "disk_size" {
type = string
description = "Size of the boot disk"
default = "20G"
}
variable "ip_address" {
type = string
description = "Static IP address for the VM"
}
variable "gateway_ip" {
type = string
description = "Gateway IP address"
}
variable "network_bridge" {
type = string
description = "Network bridge to connect to"
default = "vmbr0"
}
variable "cloud_init_snippet" {
type = string
description = "Name of the cloud-init snippet file"
default = "management-vm-cloud-init.yml"
}
variable "disk_storage" {
type = string
description = "Storage location for disks"
default = "local-lvm"
}
terraform/modules/ubuntu-cloud-init-vm/outputs.tf:
output "vm_id" {
description = "ID of the created VM"
value = proxmox_vm_qemu.management_vm.vmid
}
output "vm_name" {
description = "Name of the created VM"
value = proxmox_vm_qemu.management_vm.name
}
output "vm_ipv4_address" {
description = "IPv4 address of the VM"
value = var.ip_address
}
output "ssh_connection" {
description = "SSH connection command"
value = "ssh ${var.vm_name}@${var.ip_address}"
}
First, let’s outline the directory structure for your Talos cluster project:
talos-cluster-tutorial/
├── terraform/
│ ├── modules/
│ │ ├── talos-node/ # Reusable Talos VM module
│ │ │ ├── main.tf # VM resource definition previously shown
│ │ │ ├── variables.tf # Module inputs
│ │ │ └── outputs.tf # Module outputs
│ │ └── ubuntu-cloud-init-vm/ # Reusable management previously shown
│ │ ├── main.tf # Ubuntu VM with cloud-init
│ │ ├── variables.tf # Module inputs
│ │ └── outputs.tf # Module outputs
│ └── environments/
│ └── tutorial/ # Complete tutorial deployment
│ ├── main.tf # Management VM + 5 Talos VMs
│ ├── providers.tf # Proxmox provider configuration
│ ├── variables.tf # All configuration variables
│ ├── outputs.tf # All deployment outputs
│ └── terraform.tfvars # Your settings (you'll create)
Now let’s create all the Terraform files for the tutorial environment that will deploy both the management VM and all 5 Talos cluster nodes.
terraform/environments/tutorial/providers.tf:
terraform {
required_providers {
proxmox = {
source = "telmate/proxmox"
version = "3.0.2-rc01"
}
}
}
variable "pm_api_url" {
type = string
description = "Proxmox API URL"
}
variable "pm_user" {
type = string
description = "Proxmox username"
}
variable "pm_password" {
type = string
description = "Proxmox password"
sensitive = true
}
provider "proxmox" {
pm_api_url = var.pm_api_url
pm_user = var.pm_user
pm_password = var.pm_password
pm_tls_insecure = true
}
terraform/environments/tutorial/variables.tf:
# Proxmox Configuration
variable "target_node" {
type = string
description = "Proxmox node name where VMs will be created"
}
variable "ubuntu_template" {
type = string
description = "Name of the Ubuntu cloud-init template"
default = "ubuntu-22.04-template"
}
variable "talos_template" {
type = string
description = "Name of the Talos template"
default = "talos-template"
}
# Network Configuration
variable "management_ip" {
type = string
description = "Static IP for management VM"
default = "192.168.1.200"
}
variable "gateway_ip" {
type = string
description = "Network gateway IP"
default = "192.168.1.254"
}
variable "network_bridge" {
type = string
description = "Proxmox network bridge"
default = "vmbr0"
}
# Cluster Sizing
variable "controlplane_count" {
type = number
description = "Number of control plane nodes"
default = 3
validation {
condition = var.controlplane_count >= 1 && var.controlplane_count % 2 == 1
error_message = "Control plane count must be an odd number (1, 3, 5, etc.) for proper etcd quorum."
}
}
variable "worker_count" {
type = number
description = "Number of worker nodes"
default = 2
}
# Resource Configuration
variable "management_vm_memory" {
type = number
description = "Memory for management VM in MB"
default = 4096
}
variable "management_vm_cores" {
type = number
description = "CPU cores for management VM"
default = 2
}
variable "controlplane_memory" {
type = number
description = "Memory for control plane nodes in MB"
default = 4096
}
variable "controlplane_cores" {
type = number
description = "CPU cores for control plane nodes"
default = 2
}
variable "worker_memory" {
type = number
description = "Memory for worker nodes in MB"
default = 8192
}
variable "worker_cores" {
type = number
description = "CPU cores for worker nodes"
default = 4
}
# Storage Configuration
variable "disk_storage" {
type = string
description = "Proxmox storage location for VM disks"
default = "local-lvm"
}
terraform/environments/tutorial/main.tf:
# Management VM
module "management_vm" {
source = "../../modules/ubuntu-cloud-init-vm"
vm_name = "cluster-mgmt"
target_node = var.target_node
template_name = var.ubuntu_template
memory = var.management_vm_memory
cores = var.management_vm_cores
ip_address = var.management_ip
gateway_ip = var.gateway_ip
network_bridge = var.network_bridge
cloud_init_snippet = "management-vm-cloud-init.yml"
disk_storage = var.disk_storage
}
# Control Plane Nodes
module "control_plane_nodes" {
count = var.controlplane_count
source = "../../modules/talos-node"
node_prefix = "tutorial-cp-${count.index + 1}"
target_node = var.target_node
template_name = var.talos_template
memory = var.controlplane_memory
cores = var.controlplane_cores
network_bridge = var.network_bridge
disk_storage = var.disk_storage
}
# Worker Nodes
module "worker_nodes" {
count = var.worker_count
source = "../../modules/talos-node"
node_prefix = "tutorial-worker-${count.index + 1}"
target_node = var.target_node
template_name = var.talos_template
memory = var.worker_memory
cores = var.worker_cores
network_bridge = var.network_bridge
disk_storage = var.disk_storage
}
terraform/environments/tutorial/outputs.tf:
# Management VM Outputs
output "management_vm" {
description = "Management VM information"
value = {
name = module.management_vm.vm_name
ip_address = module.management_vm.vm_ipv4_address
ssh_command = module.management_vm.ssh_connection
}
}
# Control Plane Outputs
output "control_plane_nodes" {
description = "Control plane node information"
value = {
for i, node in module.control_plane_nodes : node.vm_name => {
vm_id = node.vm_id
dhcp_ip = node.vm_ipv4_address
node_type = "controlplane"
}
}
}
# Worker Node Outputs
output "worker_nodes" {
description = "Worker node information"
value = {
for i, node in module.worker_nodes : node.vm_name => {
vm_id = node.vm_id
dhcp_ip = node.vm_ipv4_address
node_type = "worker"
}
}
}
# Combined cluster output for easy reference
output "talos_cluster_nodes" {
description = "All Talos cluster nodes"
value = merge(
{
for i, node in module.control_plane_nodes : node.vm_name => {
vm_id = node.vm_id
dhcp_ip = node.vm_ipv4_address
node_type = "controlplane"
}
},
{
for i, node in module.worker_nodes : node.vm_name => {
vm_id = node.vm_id
dhcp_ip = node.vm_ipv4_address
node_type = "worker"
}
}
)
}
Before deploying, ensure you have the following prepared in your Proxmox environment:
VM Templates Required:
ubuntu-cloud: Ubuntu cloud-init template for the management VMtalos-template: Talos Linux template for cluster nodesCloud-Init Snippet Upload: Upload the management VM cloud-init configuration to Proxmox:
# Copy the cloud-init file to your Proxmox host
scp configs/cloud-init/management-vm-cloud-init.yml root@<proxmox-ip>:/var/lib/vz/snippets/
# Verify the file is accessible
ssh root@<proxmox-ip> "ls -la /var/lib/vz/snippets/management-vm-cloud-init.yml"
Proxmox User Setup: Ensure you have a Terraform user created (as covered in the previous article):
# Create user with API access
pveum user add terraform@pve --password 'your-secure-password'
pveum aclmod / -user terraform@pve -role PVEAdmin
Now we’ll deploy everything in one go: the management VM plus all 5 Talos cluster nodes.
Step 1: Navigate to Tutorial Environment
cd terraform/environments/tutorial
Step 2: Configure Your Deployment
Create a terraform.tfvars file with your specific settings:
cat > terraform.tfvars << 'EOF'
# Proxmox connection settings
pm_api_url = "https://your-proxmox-ip:8006/api2/json"
pm_user = "terraform@pve"
pm_password = "your-secure-password"
# Infrastructure settings
target_node = "your-proxmox-node-name" # e.g., "pve", "proxmox-01"
ubuntu_template = "ubuntu-cloud" # Your Ubuntu template name
talos_template = "talos-template" # Your Talos template name
# Network configuration
management_ip = "192.168.1.200"
gateway_ip = "192.168.1.254"
# Cluster sizing
controlplane_count = 3
worker_count = 2
EOF
Step 3: Initialize and Deploy Everything
# Initialize Terraform
terraform init
# Review what will be created (1 management VM + 5 cluster VMs)
terraform plan
# Deploy complete infrastructure
terraform apply
Expected Output:
Apply complete! Resources: 6 added, 0 changed, 0 destroyed.
Outputs:
management_vm = {
"ip_address" = "192.168.1.200"
"name" = "cluster-mgmt"
}
talos_cluster_nodes = {
"tutorial-cp-1" = {
"dhcp_ip" = "192.168.1.xxx"
"node_type" = "controlplane"
}
"tutorial-cp-2" = {
"dhcp_ip" = "192.168.1.xxx"
"node_type" = "controlplane"
}
"tutorial-cp-3" = {
"dhcp_ip" = "192.168.1.xxx"
"node_type" = "controlplane"
}
"tutorial-worker-1" = {
"dhcp_ip" = "192.168.1.xxx"
"node_type" = "worker"
}
"tutorial-worker-2" = {
"dhcp_ip" = "192.168.1.xxx"
"node_type" = "worker"
}
}
Step 4: Verify Your Infrastructure Check that all VMs are created and running:
# Test management VM access (wait 2-3 minutes for cloud-init)
ssh mentauro-admin@192.168.1.200
# Verify tools are installed
kubectl version --client
talosctl version --client
helm version
# From Proxmox web interface, verify all 6 VMs are running:
# - cluster-mgmt (management VM)
# - tutorial-cp-1, tutorial-cp-2, tutorial-cp-3 (control planes)
# - tutorial-worker-1, tutorial-worker-2 (workers)
Record DHCP IP Addresses: Note the DHCP IP addresses assigned to each Talos VM from the Terraform output - you’ll need these for the next part.
With your infrastructure deployed, you’re ready to move on to:
The Talos VMs are currently running but not yet configured as a Kubernetes cluster.
Now we’ll manually bootstrap your running VMs into a functional Kubernetes cluster. This approach gives you complete control over each step and helps you understand the Talos configuration process in detail.
SSH to your management VM to perform the bootstrap operations:
# SSH to your management VM
ssh <your-username>@192.168.1.200
# Create a working directory
mkdir -p ~/talos-cluster && cd ~/talos-cluster
First, collect the DHCP IP addresses assigned to your Talos VMs. You can find these from your Terraform output:
# Review your terraform output (run this on your local machine)
cd terraform/environments/tutorial
terraform output talos_cluster_nodes
Create a reference file with your node mappings:
# Create node mapping file
cat > nodes.txt << 'EOF'
# Control Plane Nodes (replace xxx with actual DHCP IPs)
tutorial-cp-1: 192.168.1.xxx -> 192.168.1.24
tutorial-cp-2: 192.168.1.xxx -> 192.168.1.25
tutorial-cp-3: 192.168.1.xxx -> 192.168.1.26
# Worker Nodes
tutorial-worker-1: 192.168.1.xxx -> 192.168.1.72
tutorial-worker-2: 192.168.1.xxx -> 192.168.1.73
# VIP: 192.168.1.10
# Gateway: 192.168.1.254
EOF
Generate the initial cluster configuration using the custom Talos image with Longhorn extensions. This command creates the foundational configurations that we’ll customize for each node:
# Generate base configuration
talosctl gen config tutorial-cluster https://192.168.1.10:6443 \
--output-dir . \
--install-image factory.talos.dev/installer/ce4c980550dd2ab1b17bbf2b08801c7eb59418eafe8f279833297925d67c7515:v1.10.4
What this does:
tutorial-cluster - Sets the cluster namehttps://192.168.1.10:6443 - Configures the cluster endpoint to use our VIP--install-image - Uses a custom Talos image pre-built with Longhorn storage extensions--output-dir . - Outputs configuration files to current directoryThis creates several files:
controlplane.yaml - Base control plane node configurationworker.yaml - Base worker node configurationtalosconfig - Client configuration for talosctl commandsAbout the Custom Image:
The custom image URL (factory.talos.dev/installer/ce4c980...) was created using Talos Image Factory, which builds custom Talos OS images with additional extensions. This specific image includes:
To create your own custom image, visit factory.talos.dev, select your Talos version, choose extensions (like siderolabs/util-linux-tools), and generate the image URLs.
Create network patches for each node to configure static IPs and the VIP. These patches modify the base configurations to set node-specific network settings, transforming nodes from DHCP to static IP configurations:
# Control plane 1 patch (example)
cat > tutorial-cp-1-patch.yaml << 'EOF'
machine:
network:
hostname: tutorial-cp-1
interfaces:
- interface: ens18
addresses:
- 192.168.1.24/24
routes:
- network: 0.0.0.0/0
gateway: 192.168.1.254
dhcp: false
vip:
ip: 192.168.1.10
EOF
What this patch does:
hostname - Sets the node’s hostnameinterface: ens18 - Configures the primary network interface (typical for Proxmox VMs)addresses - Assigns the static IP address with /24 subnetroutes - Sets the default gateway for internet accessdhcp: false - Disables DHCP and enables static IP configurationvip - Configures the Virtual IP for high availability (control planes only)Repeat this process for all nodes, creating patches with the appropriate hostnames and IP addresses:
tutorial-cp-2-patch.yaml (192.168.1.25)tutorial-cp-3-patch.yaml (192.168.1.26)tutorial-worker-1-patch.yaml (192.168.1.72) - Note: Workers don’t include the vip section but need kubelet.extraMounts for Longhorntutorial-worker-2-patch.yaml (192.168.1.73)Worker patch example (without VIP, with Longhorn mounts):
# Worker patch template
cat > tutorial-worker-1-patch.yaml << 'EOF'
machine:
network:
hostname: tutorial-worker-1
interfaces:
- interface: ens18
addresses:
- 192.168.1.72/24
routes:
- network: 0.0.0.0/0
gateway: 192.168.1.254
dhcp: false
kubelet:
extraMounts:
- destination: /var/lib/longhorn
type: bind
source: /var/lib/longhorn
options:
- bind
- rshared
- rw
EOF
Key differences for worker nodes:
/var/lib/longhorn directory for distributed storageGenerate the final configuration files for each node by applying the patches. This process merges the base configurations with node-specific network settings to create complete, ready-to-deploy configurations:
# Generate patched configuration for control plane 1 (example)
talosctl machineconfig patch controlplane.yaml \
--patch @tutorial-cp-1-patch.yaml \
> tutorial-cp-1.yaml
What this does:
controlplane.yaml configuration@tutorial-cp-1-patch.yamltutorial-cp-1.yaml ready for deployment@ symbol tells talosctl to read the patch from a fileRepeat this process for all nodes:
controlplane.yaml with their respective patchesworker.yaml with their respective patchesExample commands for all nodes:
# Control planes (use controlplane.yaml as base)
talosctl machineconfig patch controlplane.yaml --patch @tutorial-cp-2-patch.yaml > tutorial-cp-2.yaml
talosctl machineconfig patch controlplane.yaml --patch @tutorial-cp-3-patch.yaml > tutorial-cp-3.yaml
# Workers (use worker.yaml as base)
talosctl machineconfig patch worker.yaml --patch @tutorial-worker-1-patch.yaml > tutorial-worker-1.yaml
talosctl machineconfig patch worker.yaml --patch @tutorial-worker-2-patch.yaml > tutorial-worker-2.yaml
Apply the configurations to each node using their current DHCP IPs. This pushes the complete configurations to the nodes, triggering them to reboot and adopt their new static IP settings:
# Apply configuration to control plane 1 (example - replace xxx with actual DHCP IP)
talosctl apply-config --endpoints 192.168.1.xxx \
--nodes 192.168.1.xxx \
--file tutorial-cp-1.yaml --insecure
What this does:
--endpoints - Specifies the current DHCP IP to connect to the node--nodes - Confirms which node to apply the config to (same as endpoint)--file - Points to the complete configuration file for this node--insecure - Bypasses certificate validation (needed for initial configuration)Repeat this process for all nodes, replacing 192.168.1.xxx with the actual DHCP IP from your Terraform output:
# Apply to remaining control planes
talosctl apply-config --endpoints 192.168.1.xxx --nodes 192.168.1.xxx --file tutorial-cp-2.yaml --insecure
talosctl apply-config --endpoints 192.168.1.xxx --nodes 192.168.1.xxx --file tutorial-cp-3.yaml --insecure
# Apply to workers
talosctl apply-config --endpoints 192.168.1.xxx --nodes 192.168.1.xxx --file tutorial-worker-1.yaml --insecure
talosctl apply-config --endpoints 192.168.1.xxx --nodes 192.168.1.xxx --file tutorial-worker-2.yaml --insecure
Wait for all nodes to reboot with their new static IP configurations:
# Wait for nodes to reboot with static IPs (about 2-3 minutes)
echo "Waiting for nodes to reboot and configure static IPs..."
sleep 120
What happens during this wait:
Bootstrap the first control plane node to initialize Kubernetes. This creates the initial etcd cluster and starts the Kubernetes control plane:
# Set the talosconfig
export TALOSCONFIG=./talosconfig
# Bootstrap the first control plane node (using static IP)
talosctl bootstrap --endpoints 192.168.1.24 --nodes 192.168.1.24
# Wait for the cluster to initialize and VIP to become active
echo "Waiting for cluster initialization and VIP activation..."
sleep 90
What this does:
export TALOSCONFIG - Tells talosctl which config file to use for authenticationbootstrap - Initializes the first control plane node and creates the etcd cluster--endpoints 192.168.1.24 - Connects to the first control plane using its static IPGenerate the kubeconfig file and verify cluster connectivity. This creates the credentials needed to manage your Kubernetes cluster:
# Generate kubeconfig using the VIP endpoint
talosctl kubeconfig --endpoints 192.168.1.10 --nodes 192.168.1.10
# Set the kubeconfig for current session
export KUBECONFIG=./kubeconfig
# Verify cluster is running (this may take a few minutes)
kubectl get nodes
What this does:
talosctl kubeconfig - Generates a kubeconfig file with cluster access credentials--endpoints 192.168.1.10 - Uses the VIP for load-balanced access to the control planeexport KUBECONFIG - Sets kubectl to use the generated config filekubectl get nodes - Verifies all nodes are visible to the Kubernetes APIAt this point, you should see all your nodes, but they will be in “NotReady” status because we haven’t installed a CNI yet.
Install Cilium for pod networking and observability. This provides the network fabric that allows pods to communicate across the cluster:
# Add Cilium Helm repository
helm repo add cilium https://helm.cilium.io/
helm repo update
# Install Cilium with eBPF networking
helm install cilium cilium/cilium --version 1.16.5 \
--namespace kube-system \
--set kubeProxyReplacement=true \
--set k8sServiceHost=192.168.1.10 \
--set k8sServicePort=6443 \
--set hubble.relay.enabled=true \
--set hubble.ui.enabled=true \
--set operator.replicas=1 \
--set securityContext.privileged=true \
--set cgroup.autoMount.enabled=false \
--set cgroup.hostRoot=/sys/fs/cgroup
# Wait for Cilium to be ready
echo "Waiting for Cilium to initialize..."
kubectl wait --for=condition=ready pod -l k8s-app=cilium -n kube-system --timeout=300s
What this does:
helm repo add - Adds the official Cilium Helm chart repositorykubeProxyReplacement=true - Uses eBPF to replace kube-proxy for better performancek8sServiceHost=192.168.1.10 - Configures Cilium to use our VIP for API accesshubble.relay.enabled=true - Enables network observability and flow monitoringsecurityContext.privileged=true - Required for eBPF programs to functioncgroup settings - Talos-specific configurations for the immutable filesystemkubectl wait - Ensures all Cilium pods are ready before continuingCheck that everything is working correctly. These commands validate that your cluster is fully operational:
# Check node status
kubectl get nodes
# Check system pods
kubectl get pods -n kube-system
# Check Cilium status (if cilium CLI is installed)
cilium status --wait
What these commands verify:
kubectl get nodes - Confirms all nodes are in “Ready” status with proper roleskubectl get pods -n kube-system - Ensures all system components are runningcilium status --wait - Validates CNI networking and eBPF functionalityExpected Result:
NAME STATUS ROLES AGE VERSION
tutorial-cp-1 Ready control-plane 8m v1.10.4
tutorial-cp-2 Ready control-plane 8m v1.10.4
tutorial-cp-3 Ready control-plane 8m v1.10.4
tutorial-worker-1 Ready <none> 7m v1.10.4
tutorial-worker-2 Ready <none> 7m v1.10.4
Key indicators of success:
Albeit a long process, we now have a fully functional, basic Kubernetes cluster running on Talos OS. We covered everything from writing Terraform modules to deploy the nodes, to running all the configuration steps necessary to bootstrap the cluster.
There’s still more work to do: we need to remove manual steps by automating the processes we handled manually, and continue enhancing the cluster with a load balancer, certificate management, and other recommended components. We’ll explore these next steps in future posts, there is still plenty more to learn and even more fun to have.