Edit

Architecture Overview

Purpose: For platform engineers and architects, explains how openCenter AirGap is structured, why the work is split between three zones, and the trade-offs each major design decision was selected for.

Problem

A standard Kubernetes deployment reaches into hundreds of registries, package mirrors, and tarballs at install time. In a regulated or edge environment with no internet access, that does not work. openCenter AirGap moves all that pulling forward in time, into a connected build host, and emits a single artifact that knows how to install itself in a disconnected one.

Three-zone model

The work splits across three zones:

Zone A (Factory)         Zone B (Airlock)         Zone C (Field)
───────────────          ────────────────         ──────────────
internet-connected       physical transfer        no internet
build host               + integrity checks       bastion + nodes
                                                   |
                                                   ▼
                                                  K8s cluster
                                                  pulling only
                                                  from bastion
  • Zone A — Factory. A connected Linux host runs opencenter-airgap build. It clones source repositories, downloads container images, OS packages, Python wheels, and Kubernetes binaries, and packages everything into a Zarf .tar.zst.

  • Zone B — Airlock. The package is moved across an air-gap boundary on physical media. Before transfer, the package’s checksum, signature, and SBOM are verified.

  • Zone C — Field. A bastion host extracts the package, runs a local container registry on TCP 5000 and an apt/pip mirror on TCP 80, and points cluster nodes at those mirrors. Kubespray installs Kubernetes pulling exclusively from the bastion.

Mapping zones to deliverables:

| Zone | Inputs | Outputs | |---|---|---| | A | config/versions.env, source repos, internet | dist/zarf-package-*.tar.zst, SBOM, checksum, signature | | B | The package | The same package, plus an audit trail (signature verified, CVEs reviewed, transfer logged) | | C | The package, target nodes, an SSH key | A running Kubernetes cluster |

Build pipeline

The build orchestrator (src/opencenter_build/orchestrator.py:BuildOrchestrator) runs eight steps in order: scan repos, collect Helm charts, generate Kubespray asset lists, mirror Terraform providers (when enabled), organize assets, generate zarf.yaml, create the Zarf package, and write the artifact manifest. Every step writes a checkpoint to build/state.json. See ../reference/build-steps.md[Build Steps] for the per-step contract.

The point of the checkpointing is straightforward: downloading 25 GB of dependencies takes 30+ minutes and any of those downloads can fail. Restarting from scratch on each failure is unworkable. Restarting from the last successful step is.

Configuration model

There is exactly one source of truth for what gets bundled: config/versions.env. It pins every component version and lists the source Git repositories.

config/components.yaml is a derived view. The build generates it from versions.env on first run. Subsequent runs merge in whatever the user or opencenter-airgap add added by hand, so manual customizations survive regeneration. See ../reference/versions-env.md[versions.env Reference] and ../reference/component-manifest-schema.md[Component Manifest Schema].

The trade-off: a single source of truth gets you reproducible builds and a clean upgrade path. The cost is one layer of indirection — when the user runs add, the entry lands in components.yaml, not versions.env.

Packaging: why Zarf

Zarf is the framework that compresses and ships the bundle. Picking a third-party packager is a recurring debate; the alternatives considered were:

  • A handcrafted tar.zst with a custom installer. Cheap to start, but you eventually re-implement SBOM generation, component layering, registry image push, signing, and a deploy DSL. That is most of what Zarf already provides.

  • A single OCI bundle. Works for images. Falls over for OS packages, Python wheels, Ansible playbooks, and pinned Kubernetes binaries.

Zarf was chosen because it covers the long tail (mixed artifact types, declarative components, deploy actions, SBOM, Cosign signing) and is actively maintained. The cost is a hard dependency on the Zarf CLI on both the build host and the bastion.

Build language: Python over Bash

The build is Python (src/opencenter_build/) wrapping a small set of Bash helpers under scripts/build/ and scripts/deploy/. The logic that needs structured data (manifest dataclasses, JSON Schema validation, atomic state writes, exception types with context) lives in Python. The shells out to system tools (git, zarf, cosign, `kubespray-offline’s scripts) stay Bash.

Reviewers occasionally ask why not all-Bash. The answer in three points: type-safe data flow between steps, deterministic testing with pytest plus hypothesis, and a CLI surface that is easier to extend than argparse chains in Bash. The per-step state file in build/state.json and the manifest merge logic in merge_manifests() are the two pieces that would be brittle in Bash.

Bastion model

A single bastion host serves every cluster node in Zone C. It runs:

  • A container registry on port 5000 (registry:2.8.3).

  • An nginx file server on port 80 with three roots: /files, /debs, /pypi.

  • Optionally a Gitea on port 3000 for GitOps that needs Git over HTTPS.

Nodes are configured by an Ansible playbook (assets/playbook/offline-repo.yml) to point apt and pip at the bastion. After that, Kubespray runs as if it were online and pulls everything from the bastion.

This is the trade-off most likely to come back: the bastion is a single point of failure during install. If you reboot it mid-install, Kubespray retries idempotently and recovers. If you lose the bastion permanently, you redeploy the package onto a replacement and continue. The reason for accepting the SPOF is operational simplicity — one host to harden, one host to copy a package onto, one host to firewall.

Security architecture

Defense in depth across five layers:

| Layer | Mechanism | Where | |---|---|---| | Build-time | Pinned versions, JSON Schema validation, secrets isolated to .secrets/ | config/versions.env, src/opencenter_build/validation.py, src/opencenter_build/secrets.py | | Package-time | SBOM, SHA-256 checksum, Cosign signature | Zarf, cosign | | Transfer-time | Checksum + signature + SBOM policy gate | hack/scripts/verify-package.sh | | Deploy-time | SSH key auth, bastion firewall isolating port 5000/80 | Operational | | Runtime | Pod Security Admission (configured by Kubespray), Kyverno policies (deployed by openCenter-gitops-base) | Cluster |

The two layers that are explicitly not in scope: insider threats during the build, and physical security of the Zone B handoff. Both are operational rather than tooling concerns.

Performance

The expensive parts are network bandwidth (during build) and bastion-to-node bandwidth (during deploy):

  • A 100 Mbps build host produces a package in 20–30 minutes; a 10 Mbps host takes an hour.

  • A 10 Gbps bastion-to-node link does Kubespray in 20–30 minutes; 1 Gbps doubles that.

Disk and CPU are rarely the bottleneck. SSDs are still recommended on the bastion because the registry’s image layer cache is read-heavy.

What the system does not do

  • It does not orchestrate post-install upgrades. Once a cluster is up, Kubespray’s own upgrade playbook is the upgrade path.

  • It does not manage cluster lifecycle (scaling nodes, decommissioning). That is operational.

  • It does not produce arm64 builds today (TARGET_ARCH=amd64 in versions.env).

  • It does not enforce SBOM policy in opencenter-airgap verify. Policy gating lives in hack/scripts/verify-package.sh.

  • ../getting-started/first-deployment.md[First Deployment] — see the architecture in action.

  • ../reference/build-steps.md[Build Steps] — the per-step contract.

  • ../reference/component-manifest-schema.md[Component Manifest Schema] — the schema the build consumes.