Add Worker Pools

Purpose: For operators, shows how to add worker node pools with configuration changes and Kubespray re-run.

Prerequisites

Infrastructure capacity to provision new VMs (VMware, OpenStack, or other provider)
Kubespray inventory in infrastructure/clusters/<cluster>/inventory/
SSH access from the Kubespray runner to new nodes

Step 1: Provision New VMs

Add the new worker nodes to your Terraform configuration:

# infrastructure/clusters/<cluster>/main.tf
# Add new worker pool VMs
resource "openstack_compute_instance_v2" "gpu_workers" {
  count       = 3
  name        = "gpu-worker-${count.index + 1}"
  flavor_name = "gpu.xlarge"
  image_name  = "ubuntu-22.04"

  network {
    name = var.network_name
  }

  block_device {
    source_type           = "image"
    destination_type      = "volume"
    volume_size           = 200
    boot_index            = 0
    delete_on_termination = true
  }
}

Apply the Terraform changes:

cd infrastructure/clusters/<cluster>/
terraform plan    # Review the changes
terraform apply   # Provision the VMs

Step 2: Update the Kubespray Inventory

Add the new nodes to the inventory and assign them to a node group:

# infrastructure/clusters/<cluster>/inventory/inventory.yaml
all:
  hosts:
    # Existing nodes...
    worker-01:
      ansible_host: 192.168.12.23
    worker-02:
      ansible_host: 192.168.12.24
    # New GPU worker pool
    gpu-worker-01:
      ansible_host: 192.168.12.30
    gpu-worker-02:
      ansible_host: 192.168.12.31
    gpu-worker-03:
      ansible_host: 192.168.12.32
  children:
    kube_node:
      hosts:
        worker-01: {}
        worker-02: {}
        gpu-worker-01: {}
        gpu-worker-02: {}
        gpu-worker-03: {}

To apply node labels or taints for the new pool, add group variables:

# infrastructure/clusters/<cluster>/inventory/group_vars/gpu_workers.yml
kubelet_node_labels:
  - "node.kubernetes.io/pool=gpu"
  - "gpu=true"
kubelet_node_taints:
  - "gpu=true:NoSchedule"

Commit the inventory changes via PR.

Step 3: Run Kubespray Scale Playbook

Use the scale.yml playbook to add nodes without disrupting existing ones:

cd infrastructure/clusters/<cluster>/inventory/

ansible-playbook -i inventory.yaml \
  -b --become-user=root \
  scale.yml \
  --limit=gpu-worker-01,gpu-worker-02,gpu-worker-03

The --limit flag restricts the playbook to only the new nodes.

Step 4: Verify

# Confirm new nodes are Ready
kubectl get nodes -o wide

# Check labels and taints
kubectl describe node gpu-worker-01 | grep -A5 "Labels\|Taints"

# Verify pods can schedule on the new pool (if taints are set, only tolerated pods will land here)
kubectl run test-gpu --image=busybox --restart=Never \
  --overrides='{"spec":{"tolerations":[{"key":"gpu","operator":"Equal","value":"true","effect":"NoSchedule"}],"nodeSelector":{"gpu":"true"}}}' \
  -- sleep 10
kubectl get pod test-gpu -o wide
kubectl delete pod test-gpu

Troubleshooting

Kubespray fails on new nodes — Verify SSH connectivity: ansible -i inventory.yaml gpu-worker-01 -m ping. Check that the base OS image meets Kubespray requirements.
Nodes join but show wrong labels — Re-run Kubespray with --tags=node to reapply node configuration.
Capacity not reflected — Check kubectl describe node <name> for allocatable resources. The VM flavor determines available CPU and memory.

Prerequisites​

Step 1: Provision New VMs​

Step 2: Update the Kubespray Inventory​

Step 3: Run Kubespray Scale Playbook​

Step 4: Verify​

Troubleshooting​