Purpose: For platform teams, explains configuration management from initialization through updates, covering configuration flow through drift management.
Understanding the configuration lifecycle helps you manage clusters effectively and maintain consistency. This explanation covers how configuration evolves from creation to production.
Configuration as Code
openCenter treats configuration as code with these principles:
-
Single Source of Truth: One YAML file defines entire cluster
-
Version Controlled: Configuration stored in Git
-
Declarative: Describe desired state, not steps
-
Validated: Multi-layered validation before deployment
-
Auditable: All changes tracked in Git history
Why configuration as code: Reproducible deployments, audit trail, rollback capability, collaboration via pull requests.
Evidence: .kiro/steering/product.md:30-35, Session 2 B0 section 2
Configuration Stages
Stage 1: Initialization
Purpose: Create initial configuration with sensible defaults.
Command:
opencenter cluster init my-cluster --org my-org --type openstack
Process:
1. CLI: Load built-in defaults (internal/config/defaults.go)
2. CLI: Load CLI defaults (~/.config/opencenter/config.yaml)
3. CLI: Apply command-line flags (--org, --type)
4. CLI: Generate configuration file
5. CLI: Write to ~/.config/opencenter/clusters/my-org/.my-cluster-config.yaml
Generated Configuration:
schema_version: "2.0"
opencenter:
meta:
name: my-cluster
environment: production
region: sjc3
organization: my-org
infrastructure:
provider: openstack
openstack:
region: sjc3
availability_zone: az1
# ... (100+ default fields)
cluster:
kubernetes:
version: "1.33.5"
master_count: 3
worker_count: 2
# ... (50+ default fields)
services:
cert-manager:
enabled: true
keycloak:
enabled: true
# ... (20+ services)
Why this design: Defaults provide production-ready configuration. Users only customize what’s needed. Reduces configuration complexity.
Evidence: cmd/cluster_init.go, internal/config/defaults.go:48-451, Session 2 B0 section 2
Stage 2: Customization
Purpose: Customize configuration for specific requirements.
Methods:
-
Direct Editing: Edit configuration file with text editor
-
Interactive Mode:
opencenter cluster edit my-cluster -
CLI Flags:
opencenter cluster set my-cluster cluster.worker_count=5
Common Customizations:
# Node counts
opencenter:
cluster:
master_count: 3 # High availability
worker_count: 5 # Increased capacity
# Networking
opencenter:
cluster:
networking:
pod_subnet: "10.42.0.0/16"
service_subnet: "10.43.0.0/16"
cni_plugin: cilium # Changed from Calico
# Services
opencenter:
services:
loki:
enabled: false # Disabled for cost optimization
harbor:
enabled: true # Enabled for container registry
hostname: harbor.example.com
Why this design: Flexibility for different use cases. Interactive mode for guided editing. CLI flags for automation.
Evidence: cmd/cluster_edit.go, cmd/cluster_set.go, Session 2 B0 section 3
Stage 3: Validation
Purpose: Verify configuration correctness before deployment.
Command:
opencenter cluster validate my-cluster
Validation Layers:
1. Schema Validation (JSON schema compliance)
↓
2. Business Rules (cross-field dependencies)
↓
3. Provider Validation (provider-specific constraints)
↓
4. Connectivity Validation (API reachability, optional)
Example Validation:
# Configuration
opencenter:
cluster:
networking:
use_octavia: false
vrrp_enabled: true
# vrrp_ip: missing
# Validation Error
Error: When use_octavia=false and vrrp_enabled=true, vrrp_ip must be set
Location: opencenter.cluster.networking.vrrp_ip
Severity: error
Why this design: Fail fast (catch errors early). Progressive validation (fast checks first). Specific error messages (easy to fix).
Evidence: internal/config/validator.go, tests/features/workflow.feature:38-43, Session 1 A3
Stage 4: Setup (GitOps Repository Generation)
Purpose: Generate complete GitOps repository structure.
Command:
opencenter cluster generate my-cluster
Process:
1. Template Engine: Load embedded templates
2. Template Engine: Inject configuration values
3. Template Engine: Render to GitOps repository
4. SOPS Manager: Encrypt secrets
5. Git: Initialize repository (optional)
Generated Structure:
<git_dir>/
├── .gitignore
├── .sops.yaml
├── README.md
│
├── applications/
│ └── overlays/my-cluster/
│ ├── flux-system/ # FluxCD bootstrap
│ ├── services/ # Platform services
│ └── managed-services/ # Customer applications
│
└── infrastructure/
└── clusters/my-cluster/
├── main.tf # Terraform/OpenTofu
├── inventory/ # Kubespray Ansible
└── kubeconfig.yaml # Generated after deployment
Why this design: Standardized structure (consistency). Templates ensure correctness. Secrets encrypted before commit.
Evidence: internal/gitops/, tests/features/workflow.feature:58-65, Session 2 B0 section 15
Stage 5: Deployment
Purpose: Provision infrastructure and deploy Kubernetes.
Command:
opencenter cluster deploy my-cluster
Process:
1. Terraform: Provision infrastructure (VMs, networks, storage)
↓
2. Kubespray: Deploy Kubernetes (control plane, workers, CNI)
↓
3. FluxCD: Bootstrap GitOps (install controllers, create sources)
↓
4. FluxCD: Reconcile services (deploy platform services)
↓
5. CLI: Cluster ready
Duration: 20-40 minutes (depends on provider and cluster size)
Why this design: Automated end-to-end deployment. No manual steps. Idempotent (safe to re-run).
Evidence: cmd/cluster_bootstrap.go, Session 2 B0 section 2
Stage 6: Operation
Purpose: Manage running cluster.
Operations:
-
Monitor: Check cluster health and service status
-
Update: Apply configuration changes
-
Scale: Add/remove worker nodes
-
Backup: Backup cluster data (Velero)
-
Troubleshoot: Diagnose and fix issues
Evidence: Session 2 B0 section 2
Stage 7: Updates
Purpose: Apply configuration changes to running cluster.
Workflow:
1. User: Edit configuration file
2. User: opencenter cluster validate my-cluster
3. User: opencenter cluster generate my-cluster
4. User: git commit -m "Update configuration"
5. User: git push
6. FluxCD: Detects change (within 15m)
7. FluxCD: Reconciles new state
8. Services: Updated in cluster
Update Types:
-
Service Configuration: Enable/disable services, change settings
-
Node Scaling: Add/remove worker nodes
-
Networking: Change CNI plugin, load balancer settings
-
Security: Update RBAC policies, network policies
Why this design: GitOps workflow (Git as source of truth). Automated reconciliation (no manual kubectl). Auditable (Git history).
Evidence: Ecosystem.md GitOps workflow, Session 2 B0 section 2
Stage 8: Decommission
Purpose: Safely delete cluster and clean up resources.
Command:
opencenter cluster destroy my-cluster
Process:
1. Backup: Backup cluster data (optional)
2. Terraform: Destroy infrastructure (VMs, networks, storage)
3. CLI: Delete configuration file (optional)
4. Git: Archive repository (optional)
Why this design: Clean resource deletion. No orphaned resources. Optional backup for recovery.
Evidence: cmd/cluster_destroy.go, Session 2 B0 section 2
Configuration Precedence
Precedence Order (Highest to Lowest)
-
Command-line flags:
opencenter cluster set my-cluster cluster.worker_count=5 -
Configuration file:
~/.config/opencenter/clusters/my-org/.my-cluster-config.yaml -
CLI defaults:
~/.config/opencenter/config.yaml -
Built-in defaults:
internal/config/defaults.go
Example:
# Built-in default: worker_count = 2
# CLI default: worker_count = 3
# Configuration file: worker_count = 4
# Cluster set command: opencenter cluster set my-cluster cluster.worker_count=5
# Result: worker_count = 5 (command-line flag wins)
Why this design: Flexibility for different use cases. Override at multiple levels. Sensible defaults reduce configuration.
Evidence: internal/config/manager.go, Session 2 B0 section 3
Configuration Storage
File Locations
Cluster Configuration:
~/.config/opencenter/clusters/<organization>/<cluster>/.<cluster>-config.yaml
CLI Defaults:
~/.config/opencenter/config.yaml
Secrets:
~/.config/opencenter/clusters/<organization>/secrets/age/<cluster>-key.txt
~/.config/opencenter/clusters/<organization>/secrets/ssh/<cluster>-key
GitOps Repository:
<git_dir>/ # User-specified location
Why this design: Organization-based structure (multi-tenancy). Secrets separate from configuration. GitOps repository user-controlled.
Evidence: .kiro/steering/structure.md:118-128, Session 2 B0 section 14
Configuration Validation
Validation Layers
Layer 1: Schema Validation
Purpose: Verify structure, types, and formats.
Example:
# Invalid: worker_count is string, should be integer
opencenter:
cluster:
worker_count: "five"
# Validation Error
Error: Invalid type for opencenter.cluster.worker_count
Expected: integer
Actual: string
Why this layer: Fast (instant). Catches 80% of errors. Clear error messages.
Evidence: internal/config/validator.go, Session 1 A3
Layer 2: Business Rules
Purpose: Verify cross-field dependencies and logical consistency.
Example:
# Invalid: VRRP enabled but no VRRP IP
opencenter:
cluster:
networking:
use_octavia: false
vrrp_enabled: true
# vrrp_ip: missing
# Validation Error
Error: When use_octavia=false and vrrp_enabled=true, vrrp_ip must be set
Why this layer: Catches logical errors. Prevents deployment failures. Specific error messages.
Evidence: internal/config/validator.go, tests/features/workflow.feature:38-43
Layer 3: Provider Validation
Purpose: Verify provider-specific constraints.
Example:
# Invalid: Image ID doesn't exist in OpenStack
opencenter:
infrastructure:
openstack:
image_id: "invalid-image-id"
# Validation Error
Error: Image ID not found in OpenStack region sjc3
Image ID: invalid-image-id
Available images: [list of valid images]
Why this layer: Catches provider-specific errors. Prevents deployment failures. Provides helpful suggestions.
Evidence: internal/config/*_validator.go, Session 1 A3
Layer 4: Connectivity Validation (Optional)
Purpose: Verify API reachability and credentials.
Example:
# Enable connectivity validation
opencenter cluster validate my-cluster --validation online
Checks:
-
OpenStack API reachable
-
Credentials valid
-
Quotas sufficient
-
Networks available
Why optional: Requires credentials and network access. Slower than other layers. But catches deployment-time failures.
Evidence: internal/config/validator.go, Session 1 A3
Configuration Updates
Update Strategies
Strategy 1: In-Place Update
Use Case: Service configuration changes (enable/disable services, change settings)
Workflow:
1. Edit configuration file
2. Validate configuration
3. Render GitOps repository
4. Commit and push
5. FluxCD reconciles (5-15 minutes)
Example:
# Before
opencenter:
services:
loki:
enabled: true
# After
opencenter:
services:
loki:
enabled: false
Why this strategy: No cluster rebuild. Fast updates. GitOps workflow.
Strategy 2: Node Scaling
Use Case: Add/remove worker nodes
Workflow:
1. Edit configuration file (change worker_count)
2. Validate configuration
3. Render GitOps repository
4. Run Terraform apply (provision new nodes)
5. Run Kubespray (join new nodes to cluster)
Example:
# Before
opencenter:
cluster:
worker_count: 2
# After
opencenter:
cluster:
worker_count: 5
Why this strategy: Horizontal scaling. No downtime. Automated provisioning.
Strategy 3: Cluster Rebuild
Use Case: Provider change, major Kubernetes version upgrade
Workflow:
1. Backup cluster data (Velero)
2. Create new configuration file
3. Deploy new cluster
4. Restore data to new cluster
5. Validate applications
6. Cutover traffic
7. Decommission old cluster
Why this strategy: Clean slate. No migration complexity. But requires downtime.
Configuration Drift
Drift Detection
Definition: Difference between configuration file and actual infrastructure/cluster state.
Types:
-
Infrastructure Drift: VMs deleted, networks changed, storage modified
-
Configuration Drift: Manual kubectl changes, direct API modifications
-
Service Drift: Service versions changed, settings modified
Detection:
# Detect infrastructure drift
opencenter cluster drift my-cluster
# Detect configuration drift (FluxCD)
kubectl get kustomizations -A
Why drift happens: Manual changes, external automation, infrastructure failures.
Evidence: cmd/cluster_drift.go, Session 1 A8
Drift Prevention
Strategy 1: GitOps Discipline
Practice: All changes through Git, no manual kubectl.
Enforcement:
-
RBAC (limit direct cluster access)
-
Audit logging (track manual changes)
-
FluxCD reconciliation (revert manual changes)
Strategy 2: Immutable Infrastructure
Practice: Replace infrastructure instead of modifying.
Example: Deploy new nodes instead of upgrading existing nodes.
Strategy 3: Automated Reconciliation
Practice: FluxCD continuously reconciles Git → Cluster.
Interval: 5-15 minutes (configurable)
Why this strategy: Self-healing. Prevents configuration drift. Enforces GitOps discipline.
Evidence: Ecosystem.md drift detection
Drift Remediation
Process:
1. Detect drift (opencenter cluster drift)
2. Analyze changes (what changed, why)
3. Decide action:
a. Accept drift (update configuration file)
b. Revert drift (re-apply configuration)
4. Document decision (Git commit message)
Example:
# Drift detected: worker_count changed from 5 to 3
# Option A: Accept drift (update configuration)
vim ~/.config/opencenter/clusters/my-org/.my-cluster-config.yaml
# Change worker_count to 3
git commit -m "Accept worker_count drift (cost optimization)"
# Option B: Revert drift (re-apply configuration)
opencenter cluster generate my-cluster
terraform apply
# Provisions 2 new workers to reach 5 total
Configuration Versioning
Git-Based Versioning
Practice: Store configuration in Git with semantic versioning.
Workflow:
# Initial configuration
git add .my-cluster-config.yaml
git commit -m "Initial cluster configuration"
git tag v1.0.0
# Update configuration
vim .my-cluster-config.yaml
git commit -m "Enable Harbor registry"
git tag v1.1.0
# Rollback to previous version
git checkout v1.0.0
opencenter cluster generate my-cluster
Why this practice: Audit trail (Git history). Rollback capability (Git checkout). Collaboration (pull requests).
Configuration Snapshots
Practice: Backup configuration before major changes.
Workflow:
# Before major change
cp .my-cluster-config.yaml .my-cluster-config.yaml.backup-$(date +%Y%m%d)
# Make changes
vim .my-cluster-config.yaml
# If rollback needed
cp .my-cluster-config.yaml.backup-20260217 .my-cluster-config.yaml
Why this practice: Quick rollback. No Git required. Local backup.
Best Practices
1. Validate Before Deploy
Practice: Always validate configuration before deployment.
Workflow:
# Edit configuration
vim .my-cluster-config.yaml
# Validate
opencenter cluster validate my-cluster
# Only deploy if validation passes
opencenter cluster generate my-cluster
Rationale: Catch errors early. Prevent deployment failures. Faster feedback loop.
2. Use Small, Incremental Changes
Practice: One logical change per commit.
Example:
Good: "Enable Harbor registry"
Bad: "Enable Harbor, update Kubernetes version, add 3 workers"
Rationale: Easier to review. Easier to rollback. Clearer history.
3. Test in Dev First
Practice: Test configuration changes in dev before production.
Workflow:
1. Apply change to dev cluster
2. Validate functionality
3. Monitor for issues
4. Apply to production
Rationale: Catch issues in dev, not prod. Validate changes before production.
4. Document Configuration Decisions
Practice: Document why configuration choices were made.
Example:
# Configuration
opencenter:
services:
loki:
enabled: false # Disabled for cost optimization (logs to S3 instead)
Rationale: Context for future maintainers. Audit trail for decisions.
5. Backup Before Major Changes
Practice: Backup cluster data before major configuration changes.
Workflow:
# Backup cluster
velero backup create pre-upgrade-backup --include-namespaces app1,app2
# Make changes
vim .my-cluster-config.yaml
opencenter cluster generate my-cluster
# If rollback needed
velero restore create --from-backup pre-upgrade-backup
Rationale: Safety net for major changes. Quick recovery if issues.
Common Misconceptions
"Configuration changes require cluster rebuild"
Reality: Most configuration changes can be applied in-place (service configuration, node scaling). Only provider changes require rebuild.
"Validation guarantees successful deployment"
Reality: Validation catches most errors, but not all. Runtime issues (quota exhaustion, network failures) can still occur.
"Manual changes are permanent"
Reality: FluxCD reverts manual changes on next reconciliation. All changes must go through Git.
Further Reading
-
architecture.md[Architecture] - System design and components
-
gitops-workflow.md[GitOps Workflow] - Repository structure and reconciliation
-
../operations/validate-configuration.md[Validate Configuration] - Validation procedures
-
../operations/troubleshoot-deployment.md[Troubleshoot Deployment] - Fix deployment issues
-
../reference/configuration-schema.md[Configuration Schema] - Complete field reference
Evidence
This explanation is based on:
-
Configuration lifecycle: Session 2 B0 section 2
-
Configuration structure: Session 2 B0 section 3
-
Validation layers:
internal/config/validator.go, Session 1 A3 -
GitOps workflow: Ecosystem.md GitOps flow
-
Drift detection:
cmd/cluster_drift.go, Session 1 A8 -
Configuration precedence:
internal/config/manager.go -
File locations:
.kiro/steering/structure.md:118-128