Kubernetes Upgrades
Purpose: For operators, shows how to perform Kubernetes upgrades with pre-flight checks, rolling updates, and validation.
Prerequisites
- SSH access to cluster nodes (via bastion or direct)
- Current
kubeconfigfor the target cluster - Kubespray inventory in
infrastructure/clusters/<cluster>/inventory/ - A Velero backup taken before starting (see Backup & Restore)
Pre-Flight Checks
Before upgrading, verify the cluster is healthy and the target version is supported:
# Check current Kubernetes version
kubectl get nodes -o wide
# Verify all nodes are Ready
kubectl get nodes
# Check for pending PodDisruptionBudgets that could block drains
kubectl get pdb -A
# Confirm no ongoing FluxCD reconciliation failures
flux get kustomizations -A
Kubernetes supports upgrading one minor version at a time (e.g., 1.28 → 1.29, not 1.28 → 1.30). If you need to jump multiple versions, plan sequential upgrades.
Update the Kubespray Version Variable
Edit the Kubespray inventory to set the target Kubernetes version:
# infrastructure/clusters/<cluster>/inventory/group_vars/k8s_cluster/k8s-cluster.yml
kube_version: v1.30.2
Commit this change to a branch and open a PR:
git checkout -b upgrade/k8s-v1.30.2
git add infrastructure/clusters/<cluster>/inventory/group_vars/k8s_cluster/k8s-cluster.yml
git commit -m "chore: upgrade Kubernetes to v1.30.2"
git push origin upgrade/k8s-v1.30.2
After PR approval and merge, proceed with the Kubespray run.
Run the Upgrade
Kubespray performs a rolling upgrade — control plane nodes first, then workers. Each node is cordoned, drained, upgraded, and uncordoned automatically.
cd infrastructure/clusters/<cluster>/inventory/
ansible-playbook -i inventory.yaml \
-b --become-user=root \
cluster.yml \
--tags=upgrade \
-e kube_version=v1.30.2
For large clusters, you can limit the upgrade to control plane first, then workers:
# Control plane only
ansible-playbook -i inventory.yaml \
-b --become-user=root \
cluster.yml \
--tags=upgrade \
--limit=kube_control_plane
# Workers only (after control plane is confirmed healthy)
ansible-playbook -i inventory.yaml \
-b --become-user=root \
cluster.yml \
--tags=upgrade \
--limit=kube_node
Post-Upgrade Validation
# Verify all nodes report the new version
kubectl get nodes -o wide
# Check system pods are running
kubectl get pods -n kube-system
# Verify etcd cluster health
ETCDCTL_API=3 etcdctl endpoint health \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/ssl/etcd/ssl/ca.pem \
--cert=/etc/ssl/etcd/ssl/node-$(hostname).pem \
--key=/etc/ssl/etcd/ssl/node-$(hostname)-key.pem
# Confirm FluxCD is reconciling normally
flux get kustomizations -A
# Run drift detection to confirm no unexpected changes
opencenter cluster drift <cluster-name>
Troubleshooting
- Node stuck in
NotReady— SSH to the node and check kubelet logs:journalctl -u kubelet -f. Common cause: container runtime version incompatibility. - Drain timeout — A PodDisruptionBudget may be blocking eviction. Check
kubectl get pdb -Aand adjust if safe. - etcd issues after upgrade — Verify etcd member list:
etcdctl member list. If a member is unhealthy, check disk I/O and network connectivity. - API deprecations — Run
kubectl get --raw /metrics | grep apiserver_requested_deprecated_apisto find workloads using removed APIs before upgrading.