Troubleshoot FluxCD Reconciliation
Purpose: For platform engineers, shows how to debug FluxCD reconciliation issues, covering status checks, log analysis, common errors, and remediation steps.
Prerequisites
-
FluxCD installed in cluster
-
flux CLI installed (
flux version) -
kubectl access to cluster
-
Basic understanding of FluxCD resources
Quick Diagnostics
Common Issues and Solutions
Issue 1: GitRepository Authentication Failure
Symptom:
flux get sources git
NAME READY MESSAGE
opencenter-base False fetch failed
Diagnosis:
kubectl describe gitrepository opencenter-base -n flux-system
Look for:
Message: failed to checkout and determine revision
Solution:
Use HTTPS in the base-repo GitRepository and do not attach a secretRef.
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
name: opencenter-base
namespace: flux-system
spec:
interval: 15m
url: https://github.com/opencenter-cloud/openCenter-gitops-base
ref:
branch: main
Force reconciliation:
flux reconcile source git opencenter-base
Issue 2: HelmRelease Stuck in "Installing"
Symptom:
flux get helmreleases -n cert-manager
NAME READY MESSAGE
cert-manager False install retries exhausted
Diagnosis:
kubectl describe helmrelease cert-manager -n cert-manager
Check events:
kubectl get events -n cert-manager --sort-by='.lastTimestamp'
View Helm controller logs:
flux logs --kind=HelmRelease --name=cert-manager --namespace=cert-manager
Common Causes:
-
Helm repository not accessible
flux get sources helm
kubectl describe helmrepository cert-manager -n flux-system
-
Chart version not found
Check HelmRelease chart version:
kubectl get helmrelease cert-manager -n cert-manager -o jsonpath='{.spec.chart.spec.version}'
Check available versions:
helm search repo cert-manager --versions
-
Values validation failed
Check values secrets:
kubectl get secret cert-manager-values-base -n cert-manager
Decode and validate:
kubectl get secret cert-manager-values-base -n cert-manager -o jsonpath='{.data.values\.yaml}' | base64 -d | yq eval
Solution:
Suspend and resume HelmRelease:
flux suspend helmrelease cert-manager -n cert-manager
flux resume helmrelease cert-manager -n cert-manager
Or delete and let Flux recreate:
kubectl delete helmrelease cert-manager -n cert-manager
flux reconcile kustomization cert-manager
Issue 3: Kustomization Drift Detected
Symptom:
flux get kustomizations
NAME READY MESSAGE
cert-manager True Applied revision: main@sha1:abc123, drift detected
Diagnosis:
kubectl describe kustomization cert-manager -n flux-system
Check drift detection mode:
kubectl get kustomization cert-manager -n flux-system -o jsonpath='{.spec.driftDetection.mode}'
Cause:
Resources were modified outside of Git (manual kubectl apply or Helm upgrade).
Solution:
View drifted resources:
flux diff kustomization cert-manager
Force reconciliation to restore Git state:
flux reconcile kustomization cert-manager --with-source
Prevent drift by enabling remediation:
spec:
driftDetection:
mode: enabled
prune: true
force: true # Force apply even if resources exist
Issue 4: SOPS Decryption Failed
Symptom:
flux get kustomizations
NAME READY MESSAGE
my-service False decryption failed
Diagnosis:
kubectl describe kustomization my-service -n flux-system
Look for:
Message: failed to decrypt secret: no age key found
Solution:
Check age key secret exists:
kubectl get secret sops-age -n flux-system
If missing, create:
kubectl create secret generic sops-age \
--from-file=age.agekey=${HOME}/.config/sops/age/<cluster>_keys.txt \
-n flux-system
Verify Kustomization references secret:
kubectl get kustomization my-service -n flux-system -o jsonpath='{.spec.decryption}'
Should show:
{"provider":"sops","secretRef":{"name":"sops-age"}}
Force reconciliation:
flux reconcile kustomization my-service
Issue 5: Dependency Wait Timeout
Symptom:
flux get kustomizations
NAME READY MESSAGE
cert-manager-certs False dependency 'cert-manager' is not ready
Diagnosis:
kubectl describe kustomization cert-manager-certs -n flux-system
Check dependency status:
flux get kustomizations | grep cert-manager
Solution:
Check dependency is healthy:
kubectl get kustomization cert-manager -n flux-system
If dependency is stuck, troubleshoot it first.
If dependency is ready but not detected, force reconciliation:
flux reconcile kustomization cert-manager
flux reconcile kustomization cert-manager-certs
Increase timeout if needed:
spec:
dependsOn:
- name: cert-manager
timeout: 10m # Increase from default 5m
Issue 6: Image Pull Errors
Symptom:
HelmRelease shows ready, but pods fail to start:
kubectl get pods -n cert-manager
NAME READY STATUS RESTARTS AGE
cert-manager-5d7f9c8b6-abc12 0/1 ImagePullBackOff 0 2m
Diagnosis:
kubectl describe pod cert-manager-5d7f9c8b6-abc12 -n cert-manager
Look for:
Failed to pull image "registry.example.com/cert-manager:v1.18.2": rpc error: code = Unknown desc = failed to pull and unpack image
Solution:
Check image exists:
# For public images
docker pull registry.example.com/cert-manager:v1.18.2
# For private registries
kubectl get secret -n cert-manager | grep regcred
Create image pull secret if needed:
kubectl create secret docker-registry regcred \
--docker-server=registry.example.com \
--docker-username=user \
--docker-password=pass \
-n cert-manager
Update HelmRelease values:
imagePullSecrets:
- name: regcred
Issue 7: Resource Quota Exceeded
Symptom:
flux logs --kind=HelmRelease --name=my-service
Error: admission webhook denied the request: exceeded quota
Diagnosis:
kubectl describe resourcequota -n my-service
Solution:
Increase quota:
apiVersion: v1
kind: ResourceQuota
metadata:
name: my-service-quota
namespace: my-service
spec:
hard:
requests.cpu: "10"
requests.memory: 20Gi
limits.cpu: "20"
limits.memory: 40Gi
Or reduce resource requests in Helm values.
Issue 8: Webhook Timeout
Symptom:
flux logs --kind=Kustomization --name=my-service
Error: context deadline exceeded
Diagnosis:
Check admission webhooks:
kubectl get validatingwebhookconfigurations
kubectl get mutatingwebhookconfigurations
Solution:
Check webhook service is running:
kubectl get pods -n kyverno
kubectl get pods -n cert-manager
If webhook is down, suspend Kustomization temporarily:
flux suspend kustomization my-service
# Fix webhook
flux resume kustomization my-service
Increase timeout:
spec:
timeout: 10m
Debugging Commands
View Flux controller logs
# All controllers
flux logs
# Specific controller
flux logs --kind=Kustomization --name=my-service
# Follow logs
flux logs --follow
# Last 100 lines
flux logs --tail=100
Force reconciliation
# Reconcile source
flux reconcile source git opencenter-base
# Reconcile Kustomization
flux reconcile kustomization my-service
# Reconcile with source update
flux reconcile kustomization my-service --with-source
# Reconcile HelmRelease
flux reconcile helmrelease my-service -n my-service
Suspend and resume
# Suspend (stop reconciliation)
flux suspend kustomization my-service
# Resume
flux resume kustomization my-service
Verification Checklist
After resolving issues:
# 1. All sources are ready
flux get sources git
flux get sources helm
# 2. All Kustomizations are ready
flux get kustomizations
# 3. All HelmReleases are ready
flux get helmreleases --all-namespaces
# 4. No suspended resources
flux get all | grep -i suspended
# 5. Check recent events
kubectl get events -n flux-system --sort-by='.lastTimestamp' | tail -20
Prevention Best Practices
-
Pin versions - Use specific tags/versions, not
latest -
Test in non-production - Validate changes before production
-
Use health checks - Configure readiness/liveness probes
-
Set resource limits - Prevent resource exhaustion
-
Monitor Flux - Set up alerts for reconciliation failures
-
Backup age keys - Store SOPS keys securely
-
Document dependencies - Clear dependency chains
-
Use drift detection - Catch manual changes
-
Implement retries - Configure remediation policies
-
Regular upgrades - Keep Flux up to date
Emergency Procedures
Complete Flux failure
If all Flux controllers are down:
# Check controller pods
kubectl get pods -n flux-system
# Restart controllers
kubectl rollout restart deployment -n flux-system
# If that fails, reinstall Flux
flux uninstall --silent
flux bootstrap git \
--url=ssh://git@github.com/${GIT_REPO}.git \
--branch=main \
--path=<cluster-repo-bootstrap-path>