Skip to main content

Backup & Restore

Purpose: For operators, shows how to configure etcd backups, Velero snapshots, and perform full cluster restore.

Prerequisites

  • Velero deployed via FluxCD (included in openCenter-gitops-base)
  • S3-compatible object storage configured (MinIO, AWS S3, or OpenStack Swift)
  • velero CLI installed locally
  • kubectl access to the target cluster

Configure a Backup Storage Location

Velero stores backups in a BackupStorageLocation (BSL). The default BSL is configured in the cluster overlay:

# applications/overlays/<cluster>/services/velero/override-values.yaml
configuration:
backupStorageLocation:
- name: default
provider: aws
bucket: opencenter-backups
config:
region: us-east-1
s3ForcePathStyle: "true"
s3Url: https://s3.example.com
volumeSnapshotLocation:
- name: default
provider: aws
config:
region: us-east-1

Commit and push this change. FluxCD reconciles the Velero HelmRelease with the new storage configuration.

Create a Backup Schedule

Define a recurring backup schedule for cluster resources and persistent volumes:

apiVersion: velero.io/v1
kind: Schedule
metadata:
name: daily-full-backup
namespace: velero
spec:
schedule: "0 2 * * *" # Daily at 02:00 UTC
template:
includedNamespaces:
- "*"
excludedNamespaces:
- velero
- flux-system
includedResources:
- "*"
snapshotVolumes: true
storageLocation: default
ttl: 720h # Retain for 30 days

Apply the schedule:

kubectl apply -f daily-full-backup.yaml

Or use the Velero CLI:

velero schedule create daily-full-backup \
--schedule="0 2 * * *" \
--include-namespaces="*" \
--exclude-namespaces=velero,flux-system \
--snapshot-volumes \
--ttl 720h

Run an On-Demand Backup

# Full cluster backup
velero backup create pre-upgrade-backup --wait

# Namespace-scoped backup
velero backup create app-backup \
--include-namespaces=production \
--snapshot-volumes \
--wait

Verify Backups

# List all backups
velero backup get

# Describe a specific backup (shows item counts, errors, warnings)
velero backup describe pre-upgrade-backup --details

# Check backup storage location status
velero backup-location get

A healthy backup shows Phase: Completed with zero errors.

Restore from Backup

Restoring creates resources from a backup into the cluster. Existing resources with the same name are skipped by default.

# Restore everything from a backup
velero restore create --from-backup pre-upgrade-backup --wait

# Restore a single namespace
velero restore create --from-backup daily-full-backup-20250115020000 \
--include-namespaces=production \
--wait

# Restore and overwrite existing resources
velero restore create --from-backup pre-upgrade-backup \
--existing-resource-policy=update \
--wait

Check restore status:

velero restore get
velero restore describe <restore-name> --details

etcd Snapshots

Velero handles application-level backups. For etcd-level disaster recovery, Kubespray configures automatic etcd snapshots on control plane nodes:

# On a control plane node, check etcd snapshot location
ls /var/lib/etcd-backup/

# Manual etcd snapshot
ETCDCTL_API=3 etcdctl snapshot save /var/lib/etcd-backup/manual-snapshot.db \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/ssl/etcd/ssl/ca.pem \
--cert=/etc/ssl/etcd/ssl/node-$(hostname).pem \
--key=/etc/ssl/etcd/ssl/node-$(hostname)-key.pem

Troubleshooting

  • Backup stuck in InProgress — Check Velero pod logs: kubectl logs -n velero deploy/velero. Common cause: unreachable storage endpoint.
  • Volume snapshots failing — Verify the VolumeSnapshotLocation matches your storage provider. Check CSI driver compatibility.
  • Restore skipping resources — By default, Velero skips existing resources. Use --existing-resource-policy=update to overwrite.
  • Partial restore — Use --include-resources and --include-namespaces flags to target specific objects.