Skip to main content

Data Portability

Purpose: For operators, explains strategies for moving persistent data between providers and clusters.

The Data Portability Challenge

Moving compute workloads between clusters is straightforward — Kubernetes manifests are portable by design. Persistent data is the hard part. PersistentVolumes are tied to specific storage backends (vSphere VMDK, OpenStack Cinder, Longhorn replicas), and those backends do not share a common transfer protocol.

Strategy 1: Velero Backup and Restore

Velero can back up PersistentVolumeClaims and their data, then restore them on a different cluster. This works when both clusters have Velero configured with access to the same (or compatible) backup storage location.

# On the source cluster: back up the namespace with volumes
velero backup create migrate-app \
--include-namespaces=production \
--snapshot-volumes \
--wait

# On the target cluster: restore from the same backup storage
velero restore create --from-backup migrate-app --wait

Limitations:

  • Both clusters must reach the same S3-compatible storage endpoint.
  • Volume snapshot plugins must be compatible (e.g., both using CSI snapshots or both using Restic/Kopia).
  • Velero's Restic/Kopia integration copies data at the file level, which works across storage backends but is slower for large volumes.

Strategy 2: Application-Level Export/Import

For databases and stateful services, use the application's own export tools. This is often more reliable than volume-level copies because the application ensures data consistency.

ApplicationExportImport
PostgreSQLpg_dump -Fc dbname > backup.dumppg_restore -d dbname backup.dump
MySQLmysqldump --single-transaction dbname > backup.sqlmysql dbname < backup.sql
Redisredis-cli BGSAVE then copy dump.rdbPlace dump.rdb in data directory
ElasticsearchSnapshot API to S3Restore snapshot on target cluster

Transfer the export file via S3, SCP, or any method that reaches the target cluster.

Strategy 3: Rsync Between Clusters

For raw file data on PersistentVolumes, rsync provides a direct copy path. This requires network connectivity between clusters (or an intermediate host).

# On the source cluster: start a temporary pod with the PVC mounted
kubectl run rsync-source --image=alpine \
--overrides='{"spec":{"containers":[{"name":"rsync","image":"alpine","command":["sleep","3600"],"volumeMounts":[{"name":"data","mountPath":"/data"}]}],"volumes":[{"name":"data","persistentVolumeClaim":{"claimName":"app-data"}}]}}'

# Copy data out
kubectl cp rsync-source:/data ./local-data/

# On the target cluster: create the PVC, start a temporary pod, copy data in
kubectl cp ./local-data/ rsync-target:/data/

For large datasets, use kubectl exec with rsync over an SSH tunnel for incremental transfers.

Strategy 4: Storage Replication (Longhorn)

If both clusters use Longhorn, you can use Longhorn's backup and restore feature with an S3-compatible backup target:

# Configure Longhorn backup target (same S3 bucket on both clusters)
# In Longhorn UI or via CR:
kubectl -n longhorn-system edit settings backup-target
# Set to: s3://longhorn-backups@us-east-1/

# Create a backup of the volume on the source cluster
# Restore the backup on the target cluster from the same S3 location

Choosing a Strategy

FactorVeleroApp-LevelRsyncLonghorn
Cross-providerYes (with Restic/Kopia)YesYesYes (S3 backup)
Data consistencyCrash-consistentApplication-consistentFile-levelCrash-consistent
Speed (large data)ModerateDepends on appFast (incremental)Moderate
ComplexityLowMediumMediumLow
Downtime requiredMinimalDepends on appMinimalMinimal

For most migrations, start with Velero (simplest). Fall back to application-level export if you need guaranteed consistency for databases. Use rsync for large file-based workloads where incremental transfer matters.

Further Reading