Skip to main content

Helm Release Rollback

Purpose: For operators, shows how to diagnose failed HelmRelease upgrades and roll back to previous revision.

Prerequisites

  • kubectl access to the cluster
  • helm CLI installed
  • flux CLI installed

Diagnosing a Failed Upgrade

Step 1: Check HelmRelease Status

flux get helmreleases -A

Look for releases with Ready: False and note the error message.

Step 2: Get Detailed Events

kubectl describe helmrelease <name> -n <namespace>

The Events section shows the failure reason. Common causes:

  • Invalid Helm values (template rendering error)
  • Failed pre-upgrade hooks
  • Resource conflicts (another controller owns the resource)
  • Timeout waiting for pods to become ready

Step 3: Check Helm History

helm history <release-name> -n <namespace>

This shows all revisions with their status (deployed, failed, superseded).

Rolling Back

Option A: Revert the Git Commit

The GitOps approach — revert the commit that introduced the bad values and push. FluxCD reconciles to the previous state automatically.

git revert <commit-hash>
git push

This is the preferred method because it keeps Git as the source of truth.

Option B: Manual Helm Rollback

If you need an immediate fix before the Git revert propagates:

  1. Suspend FluxCD reconciliation to prevent it from re-applying the bad state:
flux suspend helmrelease <name> -n <namespace>
  1. Roll back to the last working revision:
helm rollback <release-name> <revision> -n <namespace>
  1. Fix the values in Git, commit, and push.

  2. Resume reconciliation:

flux resume helmrelease <name> -n <namespace>

Skipping step 1 causes FluxCD to re-apply the failed values on the next reconciliation interval.

Option C: Force Upgrade with Fixed Values

If the release is stuck in a failed state and rollback does not work:

  1. Suspend the HelmRelease
  2. Uninstall the release manually:
helm uninstall <release-name> -n <namespace>
  1. Fix the values in Git and push
  2. Resume the HelmRelease — FluxCD performs a fresh install

This approach causes downtime for the affected service.

Preventing Failed Upgrades

  • Test value changes locally with helm template before committing
  • Use opencenter cluster validate to catch schema issues
  • Pin chart versions in HelmRelease specs — avoid floating tags
  • Set spec.upgrade.remediation.retries to allow automatic retry on transient failures

Checking Post-Rollback Health

After rollback, verify the service is healthy:

kubectl get pods -n <namespace>
helm status <release-name> -n <namespace>
flux get helmrelease <name> -n <namespace>