Purpose: Evaluate architectural approaches for reducing openCenter service-rendering boilerplate, document the real constraints in the current codebase, and recommend a path that removes hardcoded rendering behavior without introducing a second service-configuration system.
Problem Statement
Adding a new platform service to openCenter still touches multiple unrelated code paths:
-
Typed config registration in
internal/config/services/<service>.go -
Default values in
internal/config/defaults.goorinternal/config/v2/defaults.go -
Service plugin registration in
internal/services/plugins/registry.go -
Optional validator registration in
internal/services/plugins/validators.go -
CLI parameter help in
cmd/cluster_service.goviagetServiceOptions -
CLI secret help in
cmd/cluster_service.goviagetServiceSecrets -
CLI enable-time validation in
cmd/cluster_service.goviavalidateService -
GitOps overlay templates under
internal/gitops/templates/cluster-apps-base/… -
Source and Flux template discovery via filename conventions
-
Aggregate inclusion via hardcoded or convention-based rendering logic
That cost is too high for standard services whose only real work is:
-
register dependencies
-
render a known overlay directory
-
emit a standard GitRepository source
-
emit a standard Flux Kustomization pair
The goal should be narrower and more practical than "make everything YAML-driven":
-
remove hardcoded rendering topology
-
remove switch-based CLI help where possible
-
preserve the existing typed config and validation model until there is a single better replacement
Evidence From The Current Code
Hardcoded CLI behavior
cmd/cluster_service.go still hardcodes service options, secrets, and enable-time validation:
-
getServiceOptions(serviceName string)is a switch statement -
getServiceSecrets(serviceName string)is a switch statement -
validateService(serviceName string, serviceCfg any, secretsCfg *config.Secrets)is a switch statement -
processSecrets(…)maintains a manualserviceName → secret struct fieldmap
This is real duplication. The same service metadata is spread across config structs, CLI help text, secrets types, and validators.
Hardcoded render topology
internal/gitops/copy.go still discovers what to render by walking the filesystem and inferring intent from paths and filenames:
-
shouldSkipFileinspectsservices/<name>/… -
source files are recognized by
opencenter-<service>.yaml -
RenderSingleServicelocates files by directory and filename convention -
RenderClusterAppsAtomicwalks every embedded file and filters after the fact
That means service topology is encoded in the embedded directory layout, not in a first-class model.
Hardcoded template discovery
internal/template/embedded_registry.go contains a fixed serviceNames list inside inferServices. This is another manually maintained registry that can drift from the real service catalog.
Typed config is already a core runtime contract
The service config pipeline is not incidental. It is part of how the CLI works today:
-
internal/config/registry/registry.gomaps service name to Go type -
internal/config/service_map.gouses that registry to unmarshalservices:into typed structs -
internal/config/schema.gogenerates JSON schema from a hardcoded Go-side model -
internal/config/services/*.goalready hold field names, defaults, descriptions, and enums in struct tags
Any design that replaces config metadata needs to replace this entire pipeline, not just the renderer.
Existing manifest support is thin
The repository already has ServicePluginManifest, TemplateRef, ValidationRule, and manifest-loading helpers in internal/services/plugin.go. That is useful, but it is not a near-complete replacement for the current runtime.
The current manifest shape only models:
-
service identity and dependencies
-
template references
-
generic
schema/defaults/required/validation
It does not currently model:
-
CLI option groups
-
CLI secrets
-
secret ownership or fallback chains
-
source generation
-
Flux generation
-
custom plugin mode
-
overlay render roots
DefaultServiceRegistry.LoadManifestsFromDirectory also wires manifests to BasicServicePlugin, which is effectively a stub.
What The Renderer Must Produce
The renderer contract needs to be stated more precisely:
-
One
.<cluster>-config.yamlrenders one cluster overlay tree atapplications/overlays/<cluster>/. -
Reproducing the RelayPoint fixture means there are five distinct cluster config files, one per overlay:
-
.k8s-dev-config.yaml -
.k8s-dr-config.yaml -
.k8s-prod-config.yaml -
.k8s-qa-config.yaml -
.k8s-uat-config.yaml -
Each of those five config files is rendered independently to produce its matching overlay tree.
That overlay tree is larger than just services/. In the current fixture it may contain:
-
Root overlay files such as
applications/overlays/<cluster>/kustomization.yaml -
Optional cluster deployment files such as
applications/overlays/<cluster>/flux-system/ -
Platform service overlays under
applications/overlays/<cluster>/services/ -
Managed service overlays under
applications/overlays/<cluster>/managed-services/ -
Optional customer-managed overlays under
applications/overlays/<cluster>/customer-managed/
Each branch may itself contain:
-
sources/ -
fluxcd/ -
service or unit-specific overlay directories
Some render units also break the "one source + one Flux pair" assumption. For example:
-
Keycloak emits multiple GitRepository sources and multiple Flux Kustomizations
-
customer-managed/emits cluster-level sources and multiple customer-owned Flux Kustomizations -
Some clusters include
flux-system/while others do not
Any recommended design has to model the full overlay tree, not just standard services.
Design Constraints
A viable design for this codebase should satisfy all of these:
-
Preserve
cluster edit, typed unmarshalling, and schema generation during the migration. -
Remove render-time filename conventions as the source of truth.
-
Support standard services with little or no Go code beyond configuration where needed.
-
Keep complex validation and lifecycle hooks in Go.
-
Support single-service rendering without scanning unrelated files.
-
Avoid introducing a second service-config DSL unless it clearly replaces the first one everywhere.
-
Support overlay classes beyond
services/, includingmanaged-services/, optionalcustomer-managed/, and cluster-level files. -
Support multiple rendered sources and multiple Flux units for a single logical service or overlay unit.
-
Provide a typed cluster-level config surface for non-service overlay units such as
customer-managed/and optional bootstrap files.
Approach A: Inline Kubernetes Resources In Cluster Config
Embed rendered resources or resource fragments directly in the cluster config YAML.
Pros
-
Operators can see the final resource shape in one file.
-
No separate template inventory is required.
-
Any Kubernetes resource can theoretically be represented.
Cons
-
It turns the cluster config into a Kubernetes manifest store.
-
It pushes implementation details back onto operators.
-
It makes SOPS and secret handling awkward.
-
It does not work well for large or multi-directory services like Keycloak.
-
It makes
cluster editand merge workflows noisier. -
It does not address the existing typed config pipeline; it competes with it.
Approach B: Full Service Manifests For Config, CLI, Validation, And Rendering
Use YAML manifests as the new source of truth for:
-
service options
-
service secrets
-
validation rules
-
defaults
-
source and Flux generation
-
overlay discovery
-
dependency metadata
This is the broad direction proposed by the earlier version of this document.
Pros
-
In theory, a standard service becomes data-only.
-
Rendering topology can become declarative.
-
CLI help can be derived from manifests.
-
Hardcoded switch statements can go away.
Cons
-
It introduces a second service metadata system immediately.
-
It duplicates information already stored in Go config structs and secret types.
-
It does not replace the typed config registry, schema generation, or config unmarshalling by itself.
-
The existing manifest implementation is much smaller than the required design.
-
Secret fallback logic already lives in
internal/config/config.go; moving it into YAML is a separate migration. -
Complex validation still needs Go, so the model is not truly manifest-only.
Approach C: Keep The Current Model And Only Remove The CLI Switches
Keep rendering exactly as it is and only derive cluster service options and secrets from existing structs.
Pros
-
Lowest migration risk.
-
Reduces some obvious duplication in
cmd/cluster_service.go. -
Reuses metadata already present in config and secret structs.
Approach D: Overlay Unit Descriptors With Typed Config (Recommended)
Split responsibilities cleanly:
-
Typed Go config remains the source of truth for service configuration shape, defaults, and complex validation.
-
Overlay unit descriptors become the source of truth for rendered topology inside
applications/overlays/<cluster>/.
This is the narrowest change that solves the real problems first.
Core Idea
Each renderable overlay unit gets a descriptor that answers only rendering questions.
A unit can represent:
-
a platform service in
services/ -
a managed service in
managed-services/ -
a cluster-scoped customer-owned layer in
customer-managed/ -
a cluster-scoped bootstrap unit such as root overlay files or optional
flux-system/
Each descriptor answers:
-
What logical unit is this?
-
Which overlay layer does it belong to?
-
Is it gated by
services.<name>.enabled,managed-service.<name>.enabled, or by cluster-level config? -
Does it use a custom Go plugin?
-
Which GitRepository sources does it emit?
-
Which Flux Kustomizations does it emit?
-
Where is its overlay root?
-
Which files belong to the overlay?
It does not redefine:
-
the service config schema
-
CLI parameter names
-
secret structs
-
complex validation rules
-
secret fallback behavior
Example Descriptor
# internal/services/descriptors/keycloak.yaml
name: keycloak
type: security
layer: services
owner:
type: service
name: keycloak
dependencies:
- cert-manager
plugin:
mode: custom
render:
sources:
- name: opencenter-keycloak
url: ssh://git@github.com/rackerlabs/openCenter-gitops-base.git
ref_type: branch
ref_value: main
- name: opencenter-keycloak-config
source: cluster-repo
flux_units:
- name: keycloak-postgres
source_ref: opencenter-keycloak-config
path: services/keycloak/00-postgres
namespace: keycloak
depends_on:
- sources
- postgres-operator-base
- postgres-operator-override
- name: keycloak-operator
source_ref: opencenter-keycloak-config
path: services/keycloak/10-operator
namespace: keycloak
depends_on:
- sources
- keycloak-postgres
- name: keycloak-cr
source_ref: opencenter-keycloak-config
path: services/keycloak/20-keycloak
namespace: keycloak
sops_decryption: true
depends_on:
- sources
- keycloak-postgres
- keycloak-operator
- envoy-gateway-api-base
- envoy-gateway-api-override
- gateway
overlay:
root: services/keycloak
files:
- 00-postgres/kustomization.yaml
- 10-operator/kustomization.yaml
- 20-keycloak/kustomization.yaml
- 20-keycloak/keycloak-cr-patch.yaml.tpl
This descriptor is intentionally about rendering only. A cluster-scoped descriptor would use the same model with layer: customer-managed or layer: cluster.
How It Works
-
Each
.<cluster>-config.yamlstill unmarshals throughinternal/config/service_map.gousing registered Go types. -
The renderer takes one cluster config and produces one overlay tree under
applications/overlays/<cluster>/. -
The renderer loads descriptors for enabled service-scoped units and any applicable cluster-scoped units.
-
Shared templates generate GitRepository and Flux resources from descriptor lists (
sources,flux_units) plus cluster config. -
Aggregate
kustomization.yamlfiles are generated for the root overlay and each active branch (services/fluxcd,managed-services/fluxcd, optionalcustomer-managed/fluxcd). -
CLI help still derives from config structs and secret structs, not from overlay descriptors.
-
Complex validation remains in Go validators.
Required Cluster-Level Config Surface
Service maps already cover services: and managed-service:. To make the design fully capable of reproducing the RelayPoint-style overlays from one cluster config, the cluster config also needs a typed section for cluster-scoped overlay units.
One reasonable shape is:
opencenter:
gitops:
overlay_units:
flux_system:
enabled: true
customer_managed:
enabled: true
repository_name: customer-repository-rpl-apps-flux-k8s
repository_url: ssh://relaypointlogistics@git.relaypointlogistics.com/rpl/apps-flux-k8s.git
branch: main
secret_name: customer-repository-rpl-apps-flux-k8s
secret_ref: customer_managed.rpl_apps_flux_k8s
kustomizations:
- name: policies
path: /policies/qa
- name: infrastructure
path: /infrastructure/qa
- name: apps
path: /apps/qa
That keeps non-service overlay inputs out of the service schema while still making them first-class, typed, and renderable from .<cluster>-config.yaml.
Cluster-scoped units that emit Secret manifests also need a matching typed secret surface, for example:
secrets:
customer_managed:
rpl_apps_flux_k8s:
identity: ""
identity_pub: ""
known_hosts: ""
The descriptor then decides whether that secret is rendered and which manifest name it uses. The config remains the source of truth for the secret material or secret backend reference.
Pros
-
Removes filename-convention rendering logic.
-
Removes hardcoded service-name lists from render and template discovery.
-
Preserves the current typed config pipeline and schema/editor compatibility.
-
Keeps complex service behavior in Go where it already works.
-
Supports standard services, managed services, customer-managed layers, and cluster-scoped overlay units with one model.
-
Allows generic generation of sources, Flux units, and aggregate kustomizations.
-
Supports incremental migration service by service and layer by layer.
Cons
-
It is not fully YAML-driven.
-
Services with new config fields still need a Go config struct until config metadata is unified.
-
CLI secrets still need a better metadata source if the team wants to remove all manual mapping.
-
The repository will temporarily have typed config plus overlay descriptors.
-
Customer-managed and bootstrap units are not naturally "services", so naming and ownership need to be explicit in the model.
Comparison Matrix
| Criterion | A: Inline Resources | B: Full Manifests | C: Only Remove CLI Switches | D: Overlay Unit Descriptors + Typed Config | | --- | --- | --- | --- | --- | | Removes filename-convention rendering | No | Yes | No | Yes | | Preserves current typed config pipeline | No | Partially | Yes | Yes | | Introduces second source of truth for config | Yes | Yes | No | No | | Handles complex validation cleanly | Poorly | Needs Go escape hatch | Yes | Yes | | Supports generic source and Flux generation | Possible | Yes | Limited | Yes | | Migration risk | High | High | Low | Medium | | New render-only service can be data-only | No | Yes | No | Yes | | New service with new config fields requires Go changes | Yes | No | Yes | Yes | | Fit for current codebase | Poor | Moderate at best | Moderate | Strong |
Recommendation
Recommend Approach D: overlay unit descriptors with typed config.
The reasoning is straightforward:
-
The biggest current problem is hardcoded rendering topology, not typed config by itself.
-
The typed config registry is already part of config loading, schema generation, and editor UX.
-
The existing manifest support is too small to justify replacing config, CLI metadata, and rendering in one step.
-
Complex services already need Go validation and lifecycle hooks, so a pure data model would still need escape hatches.
-
An overlay descriptor catalog lets the team make the renderer deterministic and discoverable immediately.
-
The same descriptor model can cover
services/,managed-services/, optionalcustomer-managed/, and cluster-level overlay units.
Recommended Implementation Path
Phase 1: Introduce An Overlay Unit Descriptor Catalog
Add a dedicated model for overlay rendering metadata.
Suggested fields:
-
unit name
-
unit type
-
layer (
services,managed-services,customer-managed,cluster) -
owner (
serviceorcluster) -
dependency list
-
plugin mode (
defaultorcustom) -
source list
-
Flux unit list
-
overlay root
-
overlay file list
-
aggregate target metadata
-
conditions or enablement gates
The descriptor can be stored as YAML in an embedded directory or as Go structs first. YAML is acceptable here because the scope is narrow and render-specific.
Phase 2: Replace Convention-Based Rendering
Refactor these paths to use descriptors instead of filesystem guessing:
-
internal/gitops/copy.goshouldSkipFile -
internal/gitops/copy.goRenderSingleService -
internal/gitops/copy.goRenderClusterAppsAtomic -
internal/template/embedded_registry.goinferServices
After this step:
-
a unit is rendered because its descriptor says so
-
single-service rendering uses descriptor membership
-
root and branch aggregate kustomizations iterate active descriptors
-
services/,managed-services/, and optionalcustomer-managed/are all rendered from the same catalog
Phase 3: Make Source And Flux Generation Generic
Replace per-service and per-layer source and Flux discovery with shared templates driven by descriptor metadata.
That should remove a large amount of repetitive file inventory without changing service config semantics.
The shared model must support:
-
multiple sources for a single unit
-
multiple Flux units for a single unit
-
cluster-level values such as overlay path, intervals, and repo URLs
-
branch-specific aggregate
kustomization.yamlfiles
Phase 4: Derive CLI Help From Existing Types
Remove getServiceOptions and getServiceSecrets switch statements by deriving help text from:
-
registered service config structs
-
secret structs in
internal/config/types_secrets.go -
existing struct tags or generated schema metadata
This is a better improvement than duplicating CLI metadata into render descriptors.
Phase 5: Keep Complex Validation In Go
Retain Go validators for services like:
-
Keycloak
-
Loki
-
Cert-manager
If later the team wants declarative validation for simple cases, add it as a supplement to typed config, not as a second canonical schema.
Phase 6: Validate Against Fixture Repositories
Add fixture-based verification for parity with the real rendered shape.
At minimum:
-
render one cluster config into one overlay tree
-
run the renderer across the five distinct RelayPoint config files:
-
.k8s-dev-config.yaml -
.k8s-dr-config.yaml -
.k8s-prod-config.yaml -
.k8s-qa-config.yaml -
.k8s-uat-config.yaml -
diff each rendered result against
testdata/relaypoint-logistics-shared/applications/overlays/<cluster>/ -
document any intentionally canonicalized differences
Phase 7: Revisit Config Metadata Unification Later
Only after render descriptors are stable should the team consider a larger consolidation of:
-
schema generation
-
CLI options/secrets help
-
defaults
-
validation metadata
At that point, the team can decide whether to:
-
generate all metadata from Go types, or
-
move fully to manifests
That should be a separate design decision, not bundled into the rendering refactor.
Non-Goals For The First Migration
The first migration should not try to do these things:
-
delete typed service config structs
-
delete service config registration
-
move all validation into YAML
-
move secret fallback chains out of
internal/config/config.go -
embed full Kubernetes resources into cluster config
Those are larger design changes with broader blast radius than the rendering problem requires.
Fixture Parity Status
If the bar is "can this design, once implemented, take five distinct .<cluster>-config.yaml files and reproduce the RelayPoint overlay trees?", the answer is not yet.
The renderer contract must still be:
-
one
.<cluster>-config.yamlrenders oneapplications/overlays/<cluster>/tree -
the RelayPoint-style repository therefore requires five distinct config files:
-
.k8s-dev-config.yaml -
.k8s-dr-config.yaml -
.k8s-prod-config.yaml -
.k8s-qa-config.yaml -
.k8s-uat-config.yaml
The recommended architecture is still correct, but it is only a necessary base. It does not yet guarantee exact fixture parity without additional requirements.
Remaining Shortcomings
-
Descriptors need conditional file membership. The fixture contains services whose rendered file set varies by cluster. A static
overlay.fileslist is not enough for cases likepatch-subscription.yaml,rbac-manager-users.yaml,alertmanager-routes.yaml, or the cluster-specific cert-manager files. -
Cluster-scoped rendered assets still need explicit ownership. The model needs first-class handling for non-service files such as
.sops.yaml, cluster-level customer-managed source Secrets, and any other root-level rendered artifacts. -
flux-system/needs a defined lifecycle boundary. All five rootkustomization.yamlfiles reference./flux-system, but checked-influx-system/content exists only ink8s-dr,k8s-prod, andk8s-qa. The design must explicitly say whetherflux-system/*is template-rendered, bootstrap-generated, or excluded from parity diffs. -
The parity fixtures need complete per-cluster config inputs. The repository currently has no checked-in
.k8s-dev-config.yaml,.k8s-dr-config.yaml,.k8s-prod-config.yaml,.k8s-qa-config.yaml, or.k8s-uat-config.yaml. Some rendered overlay content also appears to have been added manually and is not obviously represented in config today. Exact parity requires authoring complete config fixtures first. -
Canonicalization rules need to be explicit. Some differences in the fixture are legacy naming drift rather than required behavior. The implementation needs a documented policy for things like source filenames, cert-manager filenames, and disabled services that currently leave stray files behind.
What Needs To Be Added
To make the "five configs produce five overlays" claim true, this design needs a few concrete additions:
-
Add conditional render rules to descriptor
overlay.filesentries and any generated source or Flux unit lists. -
Add typed cluster-scoped config and secret surfaces for non-service units, including customer-managed repo credentials and
.sops.yamlgeneration inputs. -
Define
flux-system/as a separate lifecycle concern and make the parity tests compare template-rendered output separately from bootstrap-owned files. -
Create and maintain the five per-cluster config fixtures as first-class test inputs.
-
Document allowed canonicalization diffs so parity tests can distinguish intentional cleanup from regressions.
The detailed problem statement and implementation plan for that work lives in services-rendering-parity-plan.md[Service Rendering: Fixture Parity Plan].
Conclusion
The recommendation still stands, but the scope needs to be explicit: this is not just "service rendering." It is "cluster overlay rendering."
The architectural split that holds up is:
-
typed config for cluster and service parameters
-
overlay unit descriptors for rendered topology
That split is the right foundation for reproducing the RelayPoint-style overlays, but it still needs the parity requirements above before it can reliably turn five distinct .<cluster>-config.yaml inputs into the five expected overlay trees.
Evidence
Code paths referenced in this document:
-
cmd/cluster_service.go -
internal/config/config.go -
internal/config/registry/registry.go -
internal/config/schema.go -
internal/config/service_map.go -
internal/config/services/ -
internal/config/types_secrets.go -
internal/gitops/copy.go -
internal/services/plugin.go -
internal/services/plugins/registry.go -
internal/services/plugins/validators.go -
internal/template/embedded_registry.go
Rendered output references reviewed from testdata/relaypoint-logistics-shared/applications/overlays/:
-
k8s-dev/,k8s-dr/,k8s-prod/,k8s-qa/,k8s-uat/— five separate cluster overlay trees -
required config inputs for fixture parity:
.k8s-dev-config.yaml,.k8s-dr-config.yaml,.k8s-prod-config.yaml,.k8s-qa-config.yaml,.k8s-uat-config.yaml -
k8s-dev/kustomization.yaml,k8s-dr/kustomization.yaml,k8s-prod/kustomization.yaml,k8s-qa/kustomization.yaml,k8s-uat/kustomization.yaml— root aggregate overlay files, all of which reference./flux-system -
checked-in
flux-system/content exists only underk8s-dr/,k8s-prod/, andk8s-qa/ -
k8s-dev/services/fluxcd/keycloak.yaml— example of one service emitting multiple Flux Kustomizations -
k8s-dev/services/sources/opencenter-keycloak.yamlandk8s-dev/services/sources/opencenter-keycloak-config.yaml— example of one service emitting multiple GitRepository sources -
k8s-dev/managed-services/fluxcd/alert-proxy.yaml— example of a managed-service overlay unit -
k8s-uat/customer-managed/sources/customer-repository-ffcb-apps-flux-k8s-secret.yaml— example of a cluster-scoped rendered Secret that must be driven by per-cluster config or secret references