Migration Planning & Execution
Purpose: For operators, shows how to plan and execute workload migration between infrastructure providers.
When Migration Is Needed
- Changing infrastructure providers (e.g., VMware → OpenStack, on-prem → cloud)
- Kubernetes version jump too large for in-place upgrade (e.g., 1.26 → 1.30)
- Cluster architecture change (different node sizing, network topology, storage backend)
- Consolidating multiple clusters into one
Planning Checklist
Before starting, document answers to these questions:
- Workload inventory — What namespaces, deployments, statefulsets, and CRDs exist on the source cluster?
- Data dependencies — Which workloads have PersistentVolumeClaims? What is the total data size?
- External integrations — DNS records, load balancer IPs, TLS certificates, external database connections.
- Downtime tolerance — Can workloads run on both clusters simultaneously (blue-green), or is a cutover window required?
- Rollback plan — If the migration fails, how do you revert to the source cluster?
# Generate a workload inventory from the source cluster
kubectl get deployments,statefulsets,daemonsets -A -o wide > workload-inventory.txt
kubectl get pvc -A -o wide > pvc-inventory.txt
kubectl get ingress,httproute -A > ingress-inventory.txt
Step 1: Provision the Target Cluster
Use openCenter CLI to initialize and provision the new cluster:
opencenter cluster init <new-cluster> --org <org-id> --type <provider>
opencenter cluster edit <new-cluster>
opencenter cluster setup <new-cluster>
# Provision infrastructure
cd infrastructure/clusters/<new-cluster>/
terraform apply
# Deploy Kubernetes
cd inventory/
ansible-playbook -i inventory.yaml -b --become-user=root cluster.yml
# Bootstrap FluxCD
opencenter cluster bootstrap <new-cluster>
Wait for all platform services to reconcile before proceeding.
Step 2: Migrate Platform Configuration
The target cluster gets its platform services from openCenter-gitops-base via FluxCD. Verify that all services match the source cluster:
# Compare service versions between clusters
flux get helmreleases -A --context=source-cluster
flux get helmreleases -A --context=target-cluster
Copy any cluster-specific overrides from the source overlay to the target overlay directory.
Step 3: Migrate Workloads
Option A: Blue-Green (Minimal Downtime)
- Deploy application manifests to the target cluster (add GitRepository sources pointing to app repos).
- Run both clusters simultaneously.
- Migrate persistent data using Velero or application-level tools (see Data Portability).
- Switch DNS/load balancer to point to the target cluster.
- Verify traffic flows to the target.
- Decommission the source cluster.
Option B: Cutover Window
- Take a Velero backup of the source cluster.
- Scale down workloads on the source cluster.
- Restore the Velero backup on the target cluster.
- Update DNS/load balancer to point to the target.
- Verify and scale up on the target.
# Source cluster: backup everything
velero backup create full-migration --wait
# Target cluster: restore
velero restore create --from-backup full-migration --wait
Step 4: Update DNS and External References
After workloads are running on the target cluster:
# Update DNS records to point to new cluster ingress/load balancer IPs
# Update any external services that reference the old cluster API server endpoint
# Update CI/CD pipeline kubeconfig references
Step 5: Validate
# Verify all workloads are running on the target
kubectl get pods -A --context=target-cluster | grep -v Running
# Check ingress/routes are responding
curl -I https://app.example.com
# Run drift detection on the target
opencenter cluster drift <new-cluster>
# Verify Velero backups are running on the target
velero backup get --context=target-cluster
Step 6: Decommission the Source Cluster
After the target cluster is stable (recommended: run both for at least 48 hours):
# Remove FluxCD from the source to stop reconciliation
flux uninstall --context=source-cluster
# Destroy infrastructure
cd infrastructure/clusters/<old-cluster>/
terraform destroy
Troubleshooting
- PVC restore fails on target — Storage class names may differ between providers. Create a StorageClass on the target that matches the source, or use Velero's
--restore-volumeswith a storage class mapping. - DNS propagation delay — TTL on DNS records can cause traffic to hit the old cluster. Lower TTL before migration, or use weighted routing during cutover.
- CRDs missing on target — If the source cluster has custom CRDs not in gitops-base, export and apply them to the target before restoring workloads.