Skip to main content

Add Worker Pools

Purpose: For operators, shows how to add worker node pools with configuration changes and Kubespray re-run.

Prerequisites

  • Infrastructure capacity to provision new VMs (VMware, OpenStack, or other provider)
  • Kubespray inventory in infrastructure/clusters/<cluster>/inventory/
  • SSH access from the Kubespray runner to new nodes

Step 1: Provision New VMs

Add the new worker nodes to your Terraform configuration:

# infrastructure/clusters/<cluster>/main.tf
# Add new worker pool VMs
resource "openstack_compute_instance_v2" "gpu_workers" {
count = 3
name = "gpu-worker-${count.index + 1}"
flavor_name = "gpu.xlarge"
image_name = "ubuntu-22.04"

network {
name = var.network_name
}

block_device {
source_type = "image"
destination_type = "volume"
volume_size = 200
boot_index = 0
delete_on_termination = true
}
}

Apply the Terraform changes:

cd infrastructure/clusters/<cluster>/
terraform plan # Review the changes
terraform apply # Provision the VMs

Step 2: Update the Kubespray Inventory

Add the new nodes to the inventory and assign them to a node group:

# infrastructure/clusters/<cluster>/inventory/inventory.yaml
all:
hosts:
# Existing nodes...
worker-01:
ansible_host: 192.168.12.23
worker-02:
ansible_host: 192.168.12.24
# New GPU worker pool
gpu-worker-01:
ansible_host: 192.168.12.30
gpu-worker-02:
ansible_host: 192.168.12.31
gpu-worker-03:
ansible_host: 192.168.12.32
children:
kube_node:
hosts:
worker-01: {}
worker-02: {}
gpu-worker-01: {}
gpu-worker-02: {}
gpu-worker-03: {}

To apply node labels or taints for the new pool, add group variables:

# infrastructure/clusters/<cluster>/inventory/group_vars/gpu_workers.yml
kubelet_node_labels:
- "node.kubernetes.io/pool=gpu"
- "gpu=true"
kubelet_node_taints:
- "gpu=true:NoSchedule"

Commit the inventory changes via PR.

Step 3: Run Kubespray Scale Playbook

Use the scale.yml playbook to add nodes without disrupting existing ones:

cd infrastructure/clusters/<cluster>/inventory/

ansible-playbook -i inventory.yaml \
-b --become-user=root \
scale.yml \
--limit=gpu-worker-01,gpu-worker-02,gpu-worker-03

The --limit flag restricts the playbook to only the new nodes.

Step 4: Verify

# Confirm new nodes are Ready
kubectl get nodes -o wide

# Check labels and taints
kubectl describe node gpu-worker-01 | grep -A5 "Labels\|Taints"

# Verify pods can schedule on the new pool (if taints are set, only tolerated pods will land here)
kubectl run test-gpu --image=busybox --restart=Never \
--overrides='{"spec":{"tolerations":[{"key":"gpu","operator":"Equal","value":"true","effect":"NoSchedule"}],"nodeSelector":{"gpu":"true"}}}' \
-- sleep 10
kubectl get pod test-gpu -o wide
kubectl delete pod test-gpu

Troubleshooting

  • Kubespray fails on new nodes — Verify SSH connectivity: ansible -i inventory.yaml gpu-worker-01 -m ping. Check that the base OS image meets Kubespray requirements.
  • Nodes join but show wrong labels — Re-run Kubespray with --tags=node to reapply node configuration.
  • Capacity not reflected — Check kubectl describe node <name> for allocatable resources. The VM flavor determines available CPU and memory.