Data Portability
Purpose: For operators, explains strategies for moving persistent data between providers and clusters.
The Data Portability Challenge
Moving compute workloads between clusters is straightforward — Kubernetes manifests are portable by design. Persistent data is the hard part. PersistentVolumes are tied to specific storage backends (vSphere VMDK, OpenStack Cinder, Longhorn replicas), and those backends do not share a common transfer protocol.
Strategy 1: Velero Backup and Restore
Velero can back up PersistentVolumeClaims and their data, then restore them on a different cluster. This works when both clusters have Velero configured with access to the same (or compatible) backup storage location.
# On the source cluster: back up the namespace with volumes
velero backup create migrate-app \
--include-namespaces=production \
--snapshot-volumes \
--wait
# On the target cluster: restore from the same backup storage
velero restore create --from-backup migrate-app --wait
Limitations:
- Both clusters must reach the same S3-compatible storage endpoint.
- Volume snapshot plugins must be compatible (e.g., both using CSI snapshots or both using Restic/Kopia).
- Velero's Restic/Kopia integration copies data at the file level, which works across storage backends but is slower for large volumes.
Strategy 2: Application-Level Export/Import
For databases and stateful services, use the application's own export tools. This is often more reliable than volume-level copies because the application ensures data consistency.
| Application | Export | Import |
|---|---|---|
| PostgreSQL | pg_dump -Fc dbname > backup.dump | pg_restore -d dbname backup.dump |
| MySQL | mysqldump --single-transaction dbname > backup.sql | mysql dbname < backup.sql |
| Redis | redis-cli BGSAVE then copy dump.rdb | Place dump.rdb in data directory |
| Elasticsearch | Snapshot API to S3 | Restore snapshot on target cluster |
Transfer the export file via S3, SCP, or any method that reaches the target cluster.
Strategy 3: Rsync Between Clusters
For raw file data on PersistentVolumes, rsync provides a direct copy path. This requires network connectivity between clusters (or an intermediate host).
# On the source cluster: start a temporary pod with the PVC mounted
kubectl run rsync-source --image=alpine \
--overrides='{"spec":{"containers":[{"name":"rsync","image":"alpine","command":["sleep","3600"],"volumeMounts":[{"name":"data","mountPath":"/data"}]}],"volumes":[{"name":"data","persistentVolumeClaim":{"claimName":"app-data"}}]}}'
# Copy data out
kubectl cp rsync-source:/data ./local-data/
# On the target cluster: create the PVC, start a temporary pod, copy data in
kubectl cp ./local-data/ rsync-target:/data/
For large datasets, use kubectl exec with rsync over an SSH tunnel for incremental transfers.
Strategy 4: Storage Replication (Longhorn)
If both clusters use Longhorn, you can use Longhorn's backup and restore feature with an S3-compatible backup target:
# Configure Longhorn backup target (same S3 bucket on both clusters)
# In Longhorn UI or via CR:
kubectl -n longhorn-system edit settings backup-target
# Set to: s3://longhorn-backups@us-east-1/
# Create a backup of the volume on the source cluster
# Restore the backup on the target cluster from the same S3 location
Choosing a Strategy
| Factor | Velero | App-Level | Rsync | Longhorn |
|---|---|---|---|---|
| Cross-provider | Yes (with Restic/Kopia) | Yes | Yes | Yes (S3 backup) |
| Data consistency | Crash-consistent | Application-consistent | File-level | Crash-consistent |
| Speed (large data) | Moderate | Depends on app | Fast (incremental) | Moderate |
| Complexity | Low | Medium | Medium | Low |
| Downtime required | Minimal | Depends on app | Minimal | Minimal |
For most migrations, start with Velero (simplest). Fall back to application-level export if you need guaranteed consistency for databases. Use rsync for large file-based workloads where incremental transfer matters.
Further Reading
- Backup & Restore — Velero configuration and usage.
- Migration Planning — End-to-end migration process.