Backup & Restore

Purpose: For operators, shows how to configure etcd backups, Velero snapshots, and perform full cluster restore.

Prerequisites

Velero deployed via FluxCD (included in openCenter-gitops-base)
S3-compatible object storage configured (MinIO, AWS S3, or OpenStack Swift)
velero CLI installed locally
kubectl access to the target cluster

Configure a Backup Storage Location

Velero stores backups in a BackupStorageLocation (BSL). The default BSL is configured in the cluster overlay:

# applications/overlays/<cluster>/services/velero/override-values.yaml
configuration:
  backupStorageLocation:
    - name: default
      provider: aws
      bucket: opencenter-backups
      config:
        region: us-east-1
        s3ForcePathStyle: "true"
        s3Url: https://s3.example.com
  volumeSnapshotLocation:
    - name: default
      provider: aws
      config:
        region: us-east-1

Commit and push this change. FluxCD reconciles the Velero HelmRelease with the new storage configuration.

Create a Backup Schedule

Define a recurring backup schedule for cluster resources and persistent volumes:

apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: daily-full-backup
  namespace: velero
spec:
  schedule: "0 2 * * *"  # Daily at 02:00 UTC
  template:
    includedNamespaces:
      - "*"
    excludedNamespaces:
      - velero
      - flux-system
    includedResources:
      - "*"
    snapshotVolumes: true
    storageLocation: default
    ttl: 720h  # Retain for 30 days

Apply the schedule:

kubectl apply -f daily-full-backup.yaml

Or use the Velero CLI:

velero schedule create daily-full-backup \
  --schedule="0 2 * * *" \
  --include-namespaces="*" \
  --exclude-namespaces=velero,flux-system \
  --snapshot-volumes \
  --ttl 720h

Run an On-Demand Backup

# Full cluster backup
velero backup create pre-upgrade-backup --wait

# Namespace-scoped backup
velero backup create app-backup \
  --include-namespaces=production \
  --snapshot-volumes \
  --wait

Verify Backups

# List all backups
velero backup get

# Describe a specific backup (shows item counts, errors, warnings)
velero backup describe pre-upgrade-backup --details

# Check backup storage location status
velero backup-location get

A healthy backup shows Phase: Completed with zero errors.

Restore from Backup

Restoring creates resources from a backup into the cluster. Existing resources with the same name are skipped by default.

# Restore everything from a backup
velero restore create --from-backup pre-upgrade-backup --wait

# Restore a single namespace
velero restore create --from-backup daily-full-backup-20250115020000 \
  --include-namespaces=production \
  --wait

# Restore and overwrite existing resources
velero restore create --from-backup pre-upgrade-backup \
  --existing-resource-policy=update \
  --wait

Check restore status:

velero restore get
velero restore describe <restore-name> --details

etcd Snapshots

Velero handles application-level backups. For etcd-level disaster recovery, Kubespray configures automatic etcd snapshots on control plane nodes:

# On a control plane node, check etcd snapshot location
ls /var/lib/etcd-backup/

# Manual etcd snapshot
ETCDCTL_API=3 etcdctl snapshot save /var/lib/etcd-backup/manual-snapshot.db \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/ssl/etcd/ssl/ca.pem \
  --cert=/etc/ssl/etcd/ssl/node-$(hostname).pem \
  --key=/etc/ssl/etcd/ssl/node-$(hostname)-key.pem

Troubleshooting

Backup stuck in InProgress — Check Velero pod logs: kubectl logs -n velero deploy/velero. Common cause: unreachable storage endpoint.
Volume snapshots failing — Verify the VolumeSnapshotLocation matches your storage provider. Check CSI driver compatibility.
Restore skipping resources — By default, Velero skips existing resources. Use --existing-resource-policy=update to overwrite.
Partial restore — Use --include-resources and --include-namespaces flags to target specific objects.

Prerequisites​

Configure a Backup Storage Location​

Create a Backup Schedule​

Run an On-Demand Backup​

Verify Backups​

Restore from Backup​

etcd Snapshots​

Troubleshooting​