Reference Architecture: Physical Storage

Purpose: For platform engineers, provides physical storage specifications including disk types, RAID configurations, storage tiers, and performance baselines.

Overview

openCenter workloads have distinct storage performance profiles. etcd requires low-latency writes (< 10 ms fsync). Container images and application data need throughput. Logs and metrics tolerate higher latency but consume large volumes. This document defines the physical disk layout to meet these requirements.

Storage Tiers

Tier	Media	Use Case	IOPS Target	Latency Target
Tier 0 (Ultra)	NVMe SSD	etcd, database WAL	> 10,000 IOPS	< 1 ms
Tier 1 (Performance)	SATA/SAS SSD	Container images, OS, application data	> 5,000 IOPS	< 5 ms
Tier 2 (Capacity)	SAS HDD (10K/15K RPM)	Logs, metrics long-term, backups	> 200 IOPS	< 10 ms

Disk Configurations by Server Role

Hypervisor Hosts (Local Storage)

Slot	Disk Type	Size	RAID	Purpose
0–1	SATA SSD or NVMe M.2	480 GB each	RAID 1	ESXi / KVM boot volume
2–3	NVMe U.2 SSD	1.92 TB each	RAID 1	Tier 0: etcd, control plane VM disks
4–7	NVMe U.2 SSD	3.84 TB each	RAID 10	Tier 1: worker VM disks, container images

Total usable local storage per host: ~1.9 TB (Tier 0) + ~7.7 TB (Tier 1).

For vSAN deployments, replace the RAID configuration with vSAN disk groups (1 cache + 2–4 capacity disks per group). See Virtual Storage.

Hypervisor Hosts (Shared Storage)

If using external SAN/NAS instead of local storage:

Component	Specification
Protocol	iSCSI (10/25 GbE), NFS v4.1, or Fibre Channel (16/32 Gbps)
LUN/Volume per host	1–2 datastores, thin provisioned
Multipath	MPIO with round-robin policy (iSCSI/FC)
Dedicated Network	VLAN 30 (Storage), MTU 9000

Management Hosts

Slot	Disk Type	Size	RAID	Purpose
0–1	SATA SSD	480 GB each	RAID 1	OS boot
2–3	SATA SSD	960 GB each	RAID 1	vCenter DB, bastion data

RAID Configuration Reference

RAID Level	Min Disks	Usable Capacity	Read IOPS	Write IOPS	Use Case
RAID 1	2	50%	2× single	1× single	Boot, etcd (write safety)
RAID 5	3	(N-1)/N	Good	Degraded (parity)	Read-heavy capacity
RAID 6	4	(N-2)/N	Good	Poor (double parity)	Large capacity, dual fault
RAID 10	4	50%	N× single	N/2× single	Performance + redundancy

RAID 10 is the default recommendation for Kubernetes workloads. RAID 5/6 write penalty is unacceptable for etcd and database workloads.

RAID Controller Settings

Setting	Value	Reason
Write Policy	Write-Back with BBU/FBU	Write-Through halves write IOPS
Read Policy	Read Ahead (Adaptive)	Benefits sequential reads
Stripe Size	256 KB	Matches typical Kubernetes I/O patterns
Cache	Enable (with battery backup)	Required for Write-Back safety
Disk Cache	Disabled	Controller cache is sufficient; disk cache risks data loss
Patrol Read	Enabled (weekly)	Detects latent media errors

If using NVMe drives directly (no RAID controller), configure software RAID via mdadm (Linux) or rely on vSAN/Ceph for redundancy.

Performance Baselines

Test storage performance before deploying Kubernetes. etcd is the most latency-sensitive component.

etcd Storage Validation

Run fio on the target disk to verify it meets etcd requirements:

fio --name=etcd-bench --ioengine=libaio --direct=1 --bs=4k \
    --iodepth=1 --rw=write --size=1G --runtime=60 \
    --filename=/var/lib/etcd/fio-test --fsync=1

Metric	Minimum	Recommended
fsync p99 latency	< 10 ms	< 2 ms
Sequential write IOPS (4K)	> 500	> 5,000
Sequential write throughput	> 50 MB/s	> 200 MB/s

If fsync p99 exceeds 10 ms, etcd will log warnings and cluster stability degrades. Use NVMe, not SATA SSD, for etcd volumes.

SAN/NAS Specifications (External Storage)

If deploying shared storage instead of or alongside local disks:

Component	Minimum	Recommended
Array Type	Mid-range (Dell PowerStore, NetApp AFF A250)	Enterprise (Dell PowerStore 9200, NetApp AFF A800)
Protocol	iSCSI 10 GbE	iSCSI 25 GbE or FC 32 Gbps
Cache	64 GB	256 GB+
Drives	All-flash SSD	All-flash NVMe
Redundancy	Dual controllers, dual fabric	Dual controllers, dual fabric, active-active
Snapshot/Clone	Required for Velero CSI snapshots	Required

Disk Replacement and Monitoring

Configure RAID controller alerts to forward to the monitoring stack (SNMP traps → Prometheus Alertmanager).
Set predictive failure thresholds: replace drives when SMART reports reallocated sector count > 0 or wear leveling < 10%.
Hot-spare disks: allocate one global hot spare per RAID controller for automatic rebuild.
Rebuild time for a 3.84 TB SSD in RAID 10: approximately 2–4 hours depending on controller and load.

Considerations

NVMe vs. SATA SSD: NVMe provides 3–5× the IOPS and lower latency than SATA SSD. Use NVMe for any disk hosting etcd or database workloads.
Drive endurance: Select drives rated for at least 1 DWPD (Drive Writes Per Day) for mixed workloads, 3 DWPD for write-intensive (etcd, databases).
Encryption: Use self-encrypting drives (SED) with OPAL 2.0 if data-at-rest encryption is required at the hardware level. This is in addition to Kubernetes encryption at rest.
Capacity planning: Prometheus with 15-second scrape interval and 90-day retention consumes approximately 50–100 GB. Loki log retention at 30 days consumes 100–500 GB depending on log volume. Plan Tier 1/Tier 2 capacity accordingly.
vSAN licensing: If using VMware vSAN, local disks must meet the vSAN HCL. Check compatibility before purchasing drives.

Overview​

Storage Tiers​

Disk Configurations by Server Role​

Hypervisor Hosts (Local Storage)​

Hypervisor Hosts (Shared Storage)​

Management Hosts​

RAID Configuration Reference​

RAID Controller Settings​

Performance Baselines​

etcd Storage Validation​

SAN/NAS Specifications (External Storage)​

Disk Replacement and Monitoring​

Considerations​