Skip to main content

Migration Planning & Execution

Purpose: For operators, shows how to plan and execute workload migration between infrastructure providers.

When Migration Is Needed

  • Changing infrastructure providers (e.g., VMware → OpenStack, on-prem → cloud)
  • Kubernetes version jump too large for in-place upgrade (e.g., 1.26 → 1.30)
  • Cluster architecture change (different node sizing, network topology, storage backend)
  • Consolidating multiple clusters into one

Planning Checklist

Before starting, document answers to these questions:

  1. Workload inventory — What namespaces, deployments, statefulsets, and CRDs exist on the source cluster?
  2. Data dependencies — Which workloads have PersistentVolumeClaims? What is the total data size?
  3. External integrations — DNS records, load balancer IPs, TLS certificates, external database connections.
  4. Downtime tolerance — Can workloads run on both clusters simultaneously (blue-green), or is a cutover window required?
  5. Rollback plan — If the migration fails, how do you revert to the source cluster?
# Generate a workload inventory from the source cluster
kubectl get deployments,statefulsets,daemonsets -A -o wide > workload-inventory.txt
kubectl get pvc -A -o wide > pvc-inventory.txt
kubectl get ingress,httproute -A > ingress-inventory.txt

Step 1: Provision the Target Cluster

Use openCenter CLI to initialize and provision the new cluster:

opencenter cluster init <new-cluster> --org <org-id> --type <provider>
opencenter cluster edit <new-cluster>
opencenter cluster setup <new-cluster>

# Provision infrastructure
cd infrastructure/clusters/<new-cluster>/
terraform apply

# Deploy Kubernetes
cd inventory/
ansible-playbook -i inventory.yaml -b --become-user=root cluster.yml

# Bootstrap FluxCD
opencenter cluster bootstrap <new-cluster>

Wait for all platform services to reconcile before proceeding.

Step 2: Migrate Platform Configuration

The target cluster gets its platform services from openCenter-gitops-base via FluxCD. Verify that all services match the source cluster:

# Compare service versions between clusters
flux get helmreleases -A --context=source-cluster
flux get helmreleases -A --context=target-cluster

Copy any cluster-specific overrides from the source overlay to the target overlay directory.

Step 3: Migrate Workloads

Option A: Blue-Green (Minimal Downtime)

  1. Deploy application manifests to the target cluster (add GitRepository sources pointing to app repos).
  2. Run both clusters simultaneously.
  3. Migrate persistent data using Velero or application-level tools (see Data Portability).
  4. Switch DNS/load balancer to point to the target cluster.
  5. Verify traffic flows to the target.
  6. Decommission the source cluster.

Option B: Cutover Window

  1. Take a Velero backup of the source cluster.
  2. Scale down workloads on the source cluster.
  3. Restore the Velero backup on the target cluster.
  4. Update DNS/load balancer to point to the target.
  5. Verify and scale up on the target.
# Source cluster: backup everything
velero backup create full-migration --wait

# Target cluster: restore
velero restore create --from-backup full-migration --wait

Step 4: Update DNS and External References

After workloads are running on the target cluster:

# Update DNS records to point to new cluster ingress/load balancer IPs
# Update any external services that reference the old cluster API server endpoint
# Update CI/CD pipeline kubeconfig references

Step 5: Validate

# Verify all workloads are running on the target
kubectl get pods -A --context=target-cluster | grep -v Running

# Check ingress/routes are responding
curl -I https://app.example.com

# Run drift detection on the target
opencenter cluster drift <new-cluster>

# Verify Velero backups are running on the target
velero backup get --context=target-cluster

Step 6: Decommission the Source Cluster

After the target cluster is stable (recommended: run both for at least 48 hours):

# Remove FluxCD from the source to stop reconciliation
flux uninstall --context=source-cluster

# Destroy infrastructure
cd infrastructure/clusters/<old-cluster>/
terraform destroy

Troubleshooting

  • PVC restore fails on target — Storage class names may differ between providers. Create a StorageClass on the target that matches the source, or use Velero's --restore-volumes with a storage class mapping.
  • DNS propagation delay — TTL on DNS records can cause traffic to hit the old cluster. Lower TTL before migration, or use weighted routing during cutover.
  • CRDs missing on target — If the source cluster has custom CRDs not in gitops-base, export and apply them to the target before restoring workloads.