Add Worker Pools
Purpose: For operators, shows how to add worker node pools with configuration changes and Kubespray re-run.
Prerequisites
- Infrastructure capacity to provision new VMs (VMware, OpenStack, or other provider)
- Kubespray inventory in
infrastructure/clusters/<cluster>/inventory/ - SSH access from the Kubespray runner to new nodes
Step 1: Provision New VMs
Add the new worker nodes to your Terraform configuration:
# infrastructure/clusters/<cluster>/main.tf
# Add new worker pool VMs
resource "openstack_compute_instance_v2" "gpu_workers" {
count = 3
name = "gpu-worker-${count.index + 1}"
flavor_name = "gpu.xlarge"
image_name = "ubuntu-22.04"
network {
name = var.network_name
}
block_device {
source_type = "image"
destination_type = "volume"
volume_size = 200
boot_index = 0
delete_on_termination = true
}
}
Apply the Terraform changes:
cd infrastructure/clusters/<cluster>/
terraform plan # Review the changes
terraform apply # Provision the VMs
Step 2: Update the Kubespray Inventory
Add the new nodes to the inventory and assign them to a node group:
# infrastructure/clusters/<cluster>/inventory/inventory.yaml
all:
hosts:
# Existing nodes...
worker-01:
ansible_host: 192.168.12.23
worker-02:
ansible_host: 192.168.12.24
# New GPU worker pool
gpu-worker-01:
ansible_host: 192.168.12.30
gpu-worker-02:
ansible_host: 192.168.12.31
gpu-worker-03:
ansible_host: 192.168.12.32
children:
kube_node:
hosts:
worker-01: {}
worker-02: {}
gpu-worker-01: {}
gpu-worker-02: {}
gpu-worker-03: {}
To apply node labels or taints for the new pool, add group variables:
# infrastructure/clusters/<cluster>/inventory/group_vars/gpu_workers.yml
kubelet_node_labels:
- "node.kubernetes.io/pool=gpu"
- "gpu=true"
kubelet_node_taints:
- "gpu=true:NoSchedule"
Commit the inventory changes via PR.
Step 3: Run Kubespray Scale Playbook
Use the scale.yml playbook to add nodes without disrupting existing ones:
cd infrastructure/clusters/<cluster>/inventory/
ansible-playbook -i inventory.yaml \
-b --become-user=root \
scale.yml \
--limit=gpu-worker-01,gpu-worker-02,gpu-worker-03
The --limit flag restricts the playbook to only the new nodes.
Step 4: Verify
# Confirm new nodes are Ready
kubectl get nodes -o wide
# Check labels and taints
kubectl describe node gpu-worker-01 | grep -A5 "Labels\|Taints"
# Verify pods can schedule on the new pool (if taints are set, only tolerated pods will land here)
kubectl run test-gpu --image=busybox --restart=Never \
--overrides='{"spec":{"tolerations":[{"key":"gpu","operator":"Equal","value":"true","effect":"NoSchedule"}],"nodeSelector":{"gpu":"true"}}}' \
-- sleep 10
kubectl get pod test-gpu -o wide
kubectl delete pod test-gpu
Troubleshooting
- Kubespray fails on new nodes — Verify SSH connectivity:
ansible -i inventory.yaml gpu-worker-01 -m ping. Check that the base OS image meets Kubespray requirements. - Nodes join but show wrong labels — Re-run Kubespray with
--tags=nodeto reapply node configuration. - Capacity not reflected — Check
kubectl describe node <name>for allocatable resources. The VM flavor determines available CPU and memory.