Fleet Upgrades
In Development (Q4 2026)
This feature is currently in development. Fleet upgrade orchestration described here is subject to change.
Purpose: For platform engineers and operators, explains how to coordinate Kubernetes and platform service upgrades across a fleet of clusters safely.
Upgrade Strategy
Fleet upgrades use a wave-based promotion model:
Wave 1: Canary cluster (dev/staging)
↓ validate (automated health checks)
Wave 2: Non-critical production clusters
↓ validate (SLO checks, 24h soak)
Wave 3: Critical production clusters
↓ validate (full regression)
Wave 4: Regulated/air-gapped clusters
Fleet Upgrade Plan
apiVersion: fleet.opencenter.cloud/v1alpha1
kind: FleetUpgradePlan
metadata:
name: k8s-1-34-upgrade
spec:
targetVersion: "1.34"
waves:
- name: canary
clusterSelector:
matchLabels:
upgrade-wave: canary
validation:
healthCheckDuration: 1h
automated: true
- name: production-standard
clusterSelector:
matchLabels:
upgrade-wave: standard
validation:
healthCheckDuration: 24h
automated: false # Manual gate
- name: production-critical
clusterSelector:
matchLabels:
upgrade-wave: critical
validation:
healthCheckDuration: 48h
automated: false
Validation Gates
Between waves, the system validates:
| Check | Automated | Blocks Next Wave |
|---|---|---|
| Node health (Ready status) | Yes | Yes |
| Pod restart rate | Yes | Yes (if > threshold) |
| FluxCD reconciliation success | Yes | Yes |
| Prometheus alert firing | Yes | Yes (critical severity) |
| Custom health endpoint | Yes | Configurable |
| Manual approval | No | Yes (if configured) |
Rollback
If validation fails:
- Wave halts automatically
- Affected clusters remain at current version
- Alert fires to fleet operators
- Manual rollback available via
opencenter fleet upgrade rollback
Service Upgrades
Platform service upgrades (gitops-base tag bumps) follow the same wave model:
- Hub updates fleet GitOps repo with new tag
- FleetKustomizations propagate to clusters per wave schedule
- Each cluster's FluxCD reconciles the new service versions