Troubleshooting
Purpose: For platform engineers and field operators, lists the failure modes the project actually hits in production and how to recover. Errors are grouped by phase.
Build phase
Configuration not found: config/versions.env
The build was started without running init first.
opencenter-airgap init
$EDITOR config/versions.env
opencenter-airgap build
Network downloads timing out
assets/ is downloaded from external registries. Symptoms include connection timeout, EOF, or 429 Too Many Requests.
# Use a proxy if your network requires one.
export HTTPS_PROXY=http://proxy.example.com:8080
export NO_PROXY=localhost,127.0.0.1
# Resume; the orchestrator restarts from the last completed step.
opencenter-airgap build
If a public registry is rate-limiting you, slow the build down by reducing parallelism or wait and retry. The build is idempotent — partial state survives across retries.
No space left on device
Peak usage during a build is ~120 GB across build/, assets/, and dist/.
df -h
opencenter-airgap clean # remove build/ and dist/
# or move the build to a larger volume:
opencenter-airgap build --state /mnt/big/state.json
Component name '…' is invalid
The schema enforces lowercase alphanumerics with hyphens, starting with a letter. Rename the component:
# Bad
- name: My_Tool
# Good
- name: my-tool
Image '…:latest' uses mutable tag
Pin the image to a specific tag. The build refuses :latest and untagged images.
config hash mismatch — refusing to resume
versions.env or components.yaml changed since the failed build started.
opencenter-airgap status # confirm
opencenter-airgap build --clean # full rerun
zarf: not found
The Zarf CLI is not installed on the build host. Install it from https://zarf.dev/install/ and re-run. If you do not need the .tar.zst package right now, the rest of the build still runs and you keep zarf.yaml, assets/, and dist/artifact-manifest.json.
Deploy phase
failed to extract package: unexpected EOF
The package was corrupted in transit.
sha256sum -c zarf-package-*.tar.zst.sha256
If checksum fails, re-copy the package from dist/ on the build host. If checksum passes but extraction still fails, transfer with a tool that does its own integrity check (e.g. rsync --checksum).
Container registry will not start
Check the registry container directly:
podman logs local-registry
ss -tlnp | grep ':5000\b' # is anything else on the port?
df -h /var/lib/registry
Common causes are port 5000 already in use, a previous run leaving an unhealthy container, or insufficient disk for the image layer cache.
Nginx file server returns 404 on /debs/ or /pypi/
The bind mounts did not pick up the unpacked assets.
docker inspect opencenter-nginx --format '{{range .Mounts}}{{.Source}} -> {{.Destination}}{{println}}{{end}}'
ls /opt/opencenter/debs /opt/opencenter/pypi
Restart the container after fixing the mount source:
docker rm -f opencenter-nginx
opencenter-airgap serve dist/zarf-package-*.tar.zst
Ansible: UNREACHABLE! … ssh
# Confirm SSH works directly.
ssh -i ~/.ssh/id_rsa deployer@node1
# Add user and key file to the inventory.
cat <<EOF >> /opt/opencenter/kubespray/inventory/mycluster/inventory.yml
all:
vars:
ansible_user: deployer
ansible_ssh_private_key_file: ~/.ssh/id_rsa
EOF
If this is a new host, add ANSIBLE_HOST_KEY_CHECKING=False for the first run only.
E: Unable to locate package kubelet on cluster nodes
The offline-repo playbook did not run, so apt is still pointing at the public Ubuntu mirrors.
cd /opt/opencenter/playbook
ansible-playbook -i ../kubespray/inventory/mycluster/inventory.yml offline-repo.yml
# Confirm on the node:
ssh node1 "cat /etc/apt/sources.list.d/opencenter.list"
ssh node1 "apt-cache policy kubelet"
Pods stuck in ImagePullBackOff
Containerd is not pointed at the bastion registry.
ssh node1 "grep -A2 'registry.mirrors' /etc/containerd/config.toml"
ssh node1 "curl -s http://${BASTION_IP}:5000/v2/_catalog | jq '.repositories | length'"
ssh node1 "sudo systemctl restart containerd"
If the registry is reachable but the specific image is missing, that image was not collected during the build. Re-run opencenter-airgap scan --repos and rebuild.
FluxCD Kustomization shows reconciliation failed
kubectl logs -n flux-system deploy/source-controller
kubectl logs -n flux-system deploy/kustomize-controller
flux reconcile kustomization flux-system --with-source
The most common cause is Gitea not being reachable from the cluster nodes. Check kubectl get svc -n gitea and confirm DNS or the Service ClusterIP from a node.
Verification phase
checksum verification failed
The file is corrupted or has been modified. Re-copy the package and the .sha256 sidecar from dist/ on the build host.
signature verification failed
The package was re-signed, the wrong public key is being used, or the file was modified.
# Confirm the public key matches the one used to sign:
cosign public-key --key .secrets/signing-key.key | diff - .secrets/signing-key.pub
If you are operating the system, regenerate keys (opencenter-airgap keygen --force) and rebuild. If you are receiving a third-party package, contact the publisher.
N image(s) use mutable or missing tags
The SBOM contains latest or untagged image references. Find them in the SBOM:
jq -r '.artifacts[]
| select(.type == "image")
| select(.version == "latest" or .version == null)
| .name' \
dist/zarf-package-*-sbom.json
For each entry, pin the version in config/components.yaml (or the upstream Helm chart values) and rebuild.
N HIGH/CRITICAL vulnerabilities reported
The SBOM scanner flagged CVEs. Either patch to a fixed version of the underlying image, or document and accept the risk in your security policy.
jq -r '.vulnerabilities[]
| select(.severity == "CRITICAL" or .severity == "HIGH")
| "\(.severity) \(.id) \(.affects // "")"' \
dist/zarf-package-*-sbom.json