Helm Release Rollback
Purpose: For operators, shows how to diagnose failed HelmRelease upgrades and roll back to previous revision.
Prerequisites
kubectlaccess to the clusterhelmCLI installedfluxCLI installed
Diagnosing a Failed Upgrade
Step 1: Check HelmRelease Status
flux get helmreleases -A
Look for releases with Ready: False and note the error message.
Step 2: Get Detailed Events
kubectl describe helmrelease <name> -n <namespace>
The Events section shows the failure reason. Common causes:
- Invalid Helm values (template rendering error)
- Failed pre-upgrade hooks
- Resource conflicts (another controller owns the resource)
- Timeout waiting for pods to become ready
Step 3: Check Helm History
helm history <release-name> -n <namespace>
This shows all revisions with their status (deployed, failed, superseded).
Rolling Back
Option A: Revert the Git Commit
The GitOps approach — revert the commit that introduced the bad values and push. FluxCD reconciles to the previous state automatically.
git revert <commit-hash>
git push
This is the preferred method because it keeps Git as the source of truth.
Option B: Manual Helm Rollback
If you need an immediate fix before the Git revert propagates:
- Suspend FluxCD reconciliation to prevent it from re-applying the bad state:
flux suspend helmrelease <name> -n <namespace>
- Roll back to the last working revision:
helm rollback <release-name> <revision> -n <namespace>
-
Fix the values in Git, commit, and push.
-
Resume reconciliation:
flux resume helmrelease <name> -n <namespace>
Skipping step 1 causes FluxCD to re-apply the failed values on the next reconciliation interval.
Option C: Force Upgrade with Fixed Values
If the release is stuck in a failed state and rollback does not work:
- Suspend the HelmRelease
- Uninstall the release manually:
helm uninstall <release-name> -n <namespace>
- Fix the values in Git and push
- Resume the HelmRelease — FluxCD performs a fresh install
This approach causes downtime for the affected service.
Preventing Failed Upgrades
- Test value changes locally with
helm templatebefore committing - Use
opencenter cluster validateto catch schema issues - Pin chart versions in
HelmReleasespecs — avoid floating tags - Set
spec.upgrade.remediation.retriesto allow automatic retry on transient failures
Checking Post-Rollback Health
After rollback, verify the service is healthy:
kubectl get pods -n <namespace>
helm status <release-name> -n <namespace>
flux get helmrelease <name> -n <namespace>