Purpose: For all users, shows how to diagnose and fix common deployment issues, covering validation errors through service failures.
Deployment issues can occur at any stage. This guide helps you diagnose problems and find solutions quickly.
Prerequisites
-
openCenter CLI installed
-
Cluster configuration created
-
Basic understanding of Kubernetes (helpful)
Diagnostic Commands
Check Cluster Status
# Validate configuration
opencenter cluster validate --verbose
# Check node status
kubectl get nodes
# Check pod status across all namespaces
kubectl get pods -A
# Check FluxCD status
kubectl get kustomizations -n flux-system
# Check service deployments
kubectl get helmreleases -A
Validation Errors
Missing Required Fields
Error:
ERROR: Required field missing: opencenter.infrastructure.cloud.openstack.application_credential_id
Diagnosis: Configuration is incomplete.
Solution: Add required field:
opencenter:
infrastructure:
cloud:
openstack:
application_credential_id: "your-app-cred-id"
secrets:
global:
openstack:
application_credential_id: "your-app-cred-id"
Schema Version Mismatch
Error:
ERROR: Unsupported schema version: 1.0
Only v2 configurations (schema_version: "2.0") are supported
Diagnosis: Configuration uses old schema format.
Solution: Update the configuration to use the canonical schema version:
schema_version: "2.0"
Invalid CIDR Ranges
Error:
ERROR: Subnet overlap detected
Field: opencenter.cluster.kubernetes.subnet_services
Overlaps with: opencenter.cluster.kubernetes.subnet_pods
Diagnosis: Network ranges overlap.
Solution: Use non-overlapping ranges:
opencenter:
cluster:
kubernetes:
subnet_pods: "10.42.0.0/16"
subnet_services: "10.43.0.0/16" # Different range
Infrastructure Provisioning Errors
Terraform/OpenTofu Failures
Error: Infrastructure provisioning fails during bootstrap.
Diagnosis: Check Terraform logs:
cd <git_dir>/infrastructure/clusters/<cluster>
cat terraform.log
Common Causes:
-
Insufficient Quotas:
``` Error: Error creating OpenStack server: Quota exceeded for instances ```
Solution: Increase OpenStack quotas or reduce node count:
```yaml opencenter: cluster: kubernetes: master_count: 1 # Reduce from 3 worker_count: 1 # Reduce from 2 ``` . *Invalid Image ID:*``` Error: Image not found: 799dcf97-3656-4361-8187-13ab1b295e33 ```
Solution: Use valid image ID:
```bash openstack image list opencenter cluster set my-cluster opencenter.infrastructure.cloud.openstack.image_id=<valid-id> ``` . *Network Not Found:*
``` Error: Network not found: floating_network_id ```
Solution: Set correct network ID:
```bash openstack network list opencenter cluster set my-cluster opencenter.infrastructure.cloud.openstack.networking.floating_network_id=<network-id> ```
SSH Connection Failures
Error: Ansible can’t connect to nodes during Kubespray deployment.
Diagnosis: Check SSH connectivity:
ssh -i <ssh-key> ubuntu@<node-ip>
Common Causes:
-
Wrong SSH Key:
Solution: Verify SSH key in configuration:
```yaml opencenter: cluster: ssh_authorized_keys: - "ssh-ed25519 AAAAC3... your-key" ``` . *Security Group Rules:*Solution: Ensure port 22 is open:
```bash openstack security group rule list <security-group> ``` . *Floating IP Not Assigned:*
Solution: Check floating IPs:
```bash openstack server list ```
Kubernetes Deployment Errors
Kubespray Failures
Error: Kubernetes deployment fails during bootstrap.
Diagnosis: Check Ansible logs:
cd <git_dir>/infrastructure/clusters/<cluster>
cat ansible.log
Common Causes:
-
Container Runtime Failure:
``` TASK [container-engine/containerd : Install containerd] failed ```
Solution: Check node connectivity and package repositories:
```bash ssh ubuntu@<node-ip> sudo apt update ``` . *Insufficient Resources:*
``` TASK [kubernetes/control-plane : kubeadm | Initialize first master] failed ```
Solution: Increase node resources or use larger flavors:
```yaml opencenter: cluster: kubernetes: flavor_master: "gp.0.8.16" # Larger flavor ``` . *Network Plugin Failure:*``` step "openstack-install-network-plugin" failed ```
Solution: Check that exactly one CNI is enabled and that OpenStack uses a Helm-backed install method:
```yaml opencenter: cluster: kubernetes: network_plugin: calico: enabled: true version: "3.32.0" install_method: helm ```For OpenStack Calico, the CLI bundles Calico `v3.32.0` `crd.projectcalico.org/v1` CRDs and eBPF custom resources. A different Calico version fails with a bundled-version error.
Node Not Ready
Error: Nodes show NotReady status.
Diagnosis:
kubectl get nodes
kubectl describe node <node-name>
Common Causes:
-
CNI Not Running:
``` Conditions: Ready False ... network plugin not ready ```
Solution: Check CNI pods:
```bash kubectl get pods -n calico-system kubectl get pods -n kube-system -l k8s-app=cilium kubectl get pods -n kube-system -l app.kubernetes.io/part-of=kube-ovn ``` . *Disk Pressure:*
``` Conditions: DiskPressure True ... disk pressure ```
Solution: Increase disk size or clean up:
```bash ssh ubuntu@<node-ip> df -h sudo docker system prune -a ``` . *Memory Pressure:*
``` Conditions: MemoryPressure True ... memory pressure ```
Solution: Use larger node flavors or reduce workloads.
FluxCD Issues
GitRepository Not Syncing
Error: FluxCD can’t access Git repository.
Diagnosis:
kubectl get gitrepositories -n flux-system
kubectl describe gitrepository <name> -n flux-system
Common Causes:
-
SSH Key Not Found:
``` Status: Conditions: Message: authentication required ```Solution: Create SSH key secret:
```bash kubectl create secret generic opencenter-base \ --from-file=identity=<ssh-private-key> \ --from-file=known_hosts=<known-hosts> \ -n flux-system ``` . *Invalid Git URL:*``` Status: Conditions: Message: repository not found ```Solution: Verify Git URL:
```yaml opencenter: gitops: git_url: "ssh://git@github.com/org/repo.git" # Correct URL ``` . *Branch Not Found:*``` Status: Conditions: Message: reference not found ```Solution: Verify branch exists:
```bash git ls-remote <git-url> ```
Kustomization Failures
Error: Kustomization shows reconciliation errors.
Diagnosis:
kubectl get kustomizations -n flux-system
kubectl describe kustomization <name> -n flux-system
Common Causes:
-
Path Not Found:
``` Status: Conditions: Message: path not found: ./applications/overlays/my-cluster ```Solution: Verify path exists in Git repository:
```bash cd <git_dir> ls -la applications/overlays/my-cluster ``` . *SOPS Decryption Failed:*
``` Status: Conditions: Message: decryption failed: no key could decrypt ```Solution: Create SOPS Age key secret:
```bash kubectl create secret generic sops-age \ --from-file=age.agekey=$SOPS_AGE_KEY_FILE \ -n flux-system ``` . *Invalid Manifest:*``` Status: Conditions: Message: validation failed: invalid YAML ```Solution: Validate manifests:
```bash cd <git_dir> kubectl apply --dry-run=client -f applications/overlays/my-cluster/ ```
Service Deployment Issues
HelmRelease Failures
Error: Service fails to deploy via Helm.
Diagnosis:
kubectl get helmreleases -A
kubectl describe helmrelease <name> -n <namespace>
Common Causes:
-
Chart Not Found:
``` Status: Conditions: Message: chart not found ```Solution: Verify HelmRepository:
```bash kubectl get helmrepositories -A kubectl describe helmrepository <name> -n <namespace> ``` . *Values Error:*
``` Status: Conditions: Message: template error: invalid value ```Solution: Check Helm values:
```bash kubectl get helmrelease <name> -n <namespace> -o yaml ``` . *Dependency Not Ready:*
``` Status: Conditions: Message: dependency not ready: cert-manager ```Solution: Wait for dependencies or check their status:
```bash kubectl get helmrelease cert-manager -n cert-manager ```
Pod CrashLoopBackOff
Error: Service pods repeatedly crash.
Diagnosis:
kubectl get pods -n <namespace>
kubectl logs <pod-name> -n <namespace>
kubectl describe pod <pod-name> -n <namespace>
Common Causes:
-
Missing Secret:
``` Error: secret "keycloak-secret" not found ```
Solution: Create secret:
```yaml # In configuration secrets: keycloak: client_secret: "your-secret" admin_password: "your-password" ```Encrypt and regenerate:
```bash opencenter secrets encrypt --path applications/overlays/my-cluster opencenter cluster generate ``` . *Insufficient Resources:*
``` Status: Reason: OOMKilled ```
Solution: Increase resource limits:
```yaml opencenter: services: kube-prometheus-stack: prometheus_volume_size: 100 # Increase ``` . *Configuration Error:*``` Error: invalid configuration: missing required field ```
Solution: Check service configuration:
```yaml opencenter: services: keycloak: hostname: "auth.example.com" # Required ```
Network Issues
Pod Network Connectivity
Error: Pods can’t communicate with each other.
Diagnosis:
# Test pod-to-pod connectivity
kubectl run test-pod --image=busybox --rm -it -- sh
# Inside pod:
ping <other-pod-ip>
nslookup kubernetes.default
Solution: Check CNI plugin:
kubectl get pods -n calico-system
kubectl get tigerastatus calico
kubectl logs -n calico-system <calico-node-pod>
Service Not Accessible
Error: Service not reachable via ClusterIP or LoadBalancer.
Diagnosis:
kubectl get svc -A
kubectl describe svc <service-name> -n <namespace>
Common Causes:
-
No Endpoints:
``` Endpoints: <none> ```
Solution: Check pod selector:
```bash kubectl get pods -n <namespace> --show-labels ``` . *LoadBalancer Pending:*
``` EXTERNAL-IP: <pending> ```
Solution: Check load balancer provider:
```bash # For Octavia kubectl logs -n kube-system <openstack-cloud-controller-manager-pod>
# For MetalLB kubectl logs -n metallb-system <metallb-controller-pod> ```
Performance Issues
Slow Reconciliation
Error: FluxCD takes too long to reconcile changes.
Diagnosis: Check reconciliation intervals:
kubectl get kustomizations -n flux-system -o yaml | grep interval
Solution: Reduce intervals for faster reconciliation:
opencenter:
gitops:
flux:
interval: "5m" # Reduce from 15m
High Resource Usage
Error: Cluster nodes running out of resources.
Diagnosis:
kubectl top nodes
kubectl top pods -A
Solution: Scale cluster or optimize services:
opencenter:
cluster:
kubernetes:
worker_count: 4 # Increase workers
Or disable unnecessary services:
opencenter:
services:
tempo:
enabled: false # Disable if not needed
Getting Help
Collect Diagnostic Information
Before asking for help, collect:
-
Configuration:
```bash opencenter cluster validate --generate-debug-config ``` . *Cluster state:*
```bash kubectl get nodes -o wide > nodes.txt kubectl get pods -A -o wide > pods.txt kubectl get kustomizations -n flux-system > flux.txt ``` . *Logs:*
```bash kubectl logs -n flux-system <flux-pod> > flux-logs.txt kubectl logs -n calico-system <calico-pod> > calico-logs.txt ```
Report Issues
When reporting issues, include:
-
openCenter version (
opencenter version) -
Provider (OpenStack, VMware, Kind, AWS)
-
Error messages (exact text)
-
Diagnostic information (above)
-
Steps to reproduce
GitHub Issues: https://github.com/opencenter-cloud/openCenter-cli/issues
Next Steps
-
validate-configuration.md[Validate Configuration] - Prevent issues before deployment
-
manage-secrets.md[Manage Secrets] - Fix SOPS-related issues
-
configure-networking.md[Configure Networking] - Fix network problems
-
../getting-started/getting-started.md[Getting Started Tutorial] - Review deployment steps
Evidence
This how-to guide is based on:
-
Validation command:
cmd/cluster_validate.go:1-108 -
Workflow validation:
tests/features/workflow.feature:38-73 -
SOPS manager errors:
internal/sops/manager.go:100-250 -
Session 1 troubleshooting: A6, A7, A8
-
Session 2 facts inventory: B0 sections 8, 13
-
Common deployment patterns: Ecosystem.md