Backup & Restore
Purpose: For operators, shows how to configure etcd backups, Velero snapshots, and perform full cluster restore.
Prerequisites
- Velero deployed via FluxCD (included in openCenter-gitops-base)
- S3-compatible object storage configured (MinIO, AWS S3, or OpenStack Swift)
veleroCLI installed locallykubectlaccess to the target cluster
Configure a Backup Storage Location
Velero stores backups in a BackupStorageLocation (BSL). The default BSL is configured in the cluster overlay:
# applications/overlays/<cluster>/services/velero/override-values.yaml
configuration:
backupStorageLocation:
- name: default
provider: aws
bucket: opencenter-backups
config:
region: us-east-1
s3ForcePathStyle: "true"
s3Url: https://s3.example.com
volumeSnapshotLocation:
- name: default
provider: aws
config:
region: us-east-1
Commit and push this change. FluxCD reconciles the Velero HelmRelease with the new storage configuration.
Create a Backup Schedule
Define a recurring backup schedule for cluster resources and persistent volumes:
apiVersion: velero.io/v1
kind: Schedule
metadata:
name: daily-full-backup
namespace: velero
spec:
schedule: "0 2 * * *" # Daily at 02:00 UTC
template:
includedNamespaces:
- "*"
excludedNamespaces:
- velero
- flux-system
includedResources:
- "*"
snapshotVolumes: true
storageLocation: default
ttl: 720h # Retain for 30 days
Apply the schedule:
kubectl apply -f daily-full-backup.yaml
Or use the Velero CLI:
velero schedule create daily-full-backup \
--schedule="0 2 * * *" \
--include-namespaces="*" \
--exclude-namespaces=velero,flux-system \
--snapshot-volumes \
--ttl 720h
Run an On-Demand Backup
# Full cluster backup
velero backup create pre-upgrade-backup --wait
# Namespace-scoped backup
velero backup create app-backup \
--include-namespaces=production \
--snapshot-volumes \
--wait
Verify Backups
# List all backups
velero backup get
# Describe a specific backup (shows item counts, errors, warnings)
velero backup describe pre-upgrade-backup --details
# Check backup storage location status
velero backup-location get
A healthy backup shows Phase: Completed with zero errors.
Restore from Backup
Restoring creates resources from a backup into the cluster. Existing resources with the same name are skipped by default.
# Restore everything from a backup
velero restore create --from-backup pre-upgrade-backup --wait
# Restore a single namespace
velero restore create --from-backup daily-full-backup-20250115020000 \
--include-namespaces=production \
--wait
# Restore and overwrite existing resources
velero restore create --from-backup pre-upgrade-backup \
--existing-resource-policy=update \
--wait
Check restore status:
velero restore get
velero restore describe <restore-name> --details
etcd Snapshots
Velero handles application-level backups. For etcd-level disaster recovery, Kubespray configures automatic etcd snapshots on control plane nodes:
# On a control plane node, check etcd snapshot location
ls /var/lib/etcd-backup/
# Manual etcd snapshot
ETCDCTL_API=3 etcdctl snapshot save /var/lib/etcd-backup/manual-snapshot.db \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/ssl/etcd/ssl/ca.pem \
--cert=/etc/ssl/etcd/ssl/node-$(hostname).pem \
--key=/etc/ssl/etcd/ssl/node-$(hostname)-key.pem
Troubleshooting
- Backup stuck in
InProgress— Check Velero pod logs:kubectl logs -n velero deploy/velero. Common cause: unreachable storage endpoint. - Volume snapshots failing — Verify the VolumeSnapshotLocation matches your storage provider. Check CSI driver compatibility.
- Restore skipping resources — By default, Velero skips existing resources. Use
--existing-resource-policy=updateto overwrite. - Partial restore — Use
--include-resourcesand--include-namespacesflags to target specific objects.