Reference Architecture: Power & Cooling
Purpose: For platform engineers, provides power distribution, redundancy (N+1, 2N), UPS sizing, and cooling requirements.
Overview
A Kubernetes cluster is only as reliable as its power and cooling infrastructure. This document covers power distribution, UPS sizing, and thermal management for the physical hosts described in Physical Compute. All calculations assume a single-rack or multi-rack deployment in an enterprise data center with existing utility power.
Power Redundancy Models
| Model | Description | Use Case |
|---|---|---|
| N | Single power feed, no redundancy | Lab / development only |
| N+1 | One additional power unit beyond minimum | Standard production |
| 2N | Fully duplicated power path (A+B feeds) | Mission-critical production |
| 2N+1 | Dual path plus one spare unit | Highest availability |
openCenter production deployments require 2N redundancy at minimum. Each server connects to two independent power feeds (A and B) via dual power supplies.
Power Budget per Rack
Estimate power draw based on server configuration. These values assume the hardware from Physical Compute.
| Component | Quantity per Rack | Watts (Typical) | Watts (Peak) |
|---|---|---|---|
| Hypervisor Host (2× Xeon Gold, 512 GB) | 4 | 600 W | 900 W |
| Management Host | 1 | 350 W | 500 W |
| ToR Switch (leaf) | 2 | 150 W | 200 W |
| OOB Management Switch | 1 | 30 W | 50 W |
| Patch Panel / Cable Management | — | 0 W | 0 W |
| Rack Total | 3,080 W | 4,550 W |
Add 20% headroom for future growth: plan circuits for approximately 5.5 kW peak per rack.
Power Distribution
PDU Configuration (2N)
Each rack requires two PDUs, one per power feed:
| Specification | Value |
|---|---|
| PDU Type | Metered or Switched (per-outlet monitoring preferred) |
| Input | Single-phase 208V/30A (L6-30P) or three-phase 208V/20A |
| Outlets | 24–42× C13/C19 |
| Monitoring | SNMP v3, per-outlet current/power readings |
| Mounting | Vertical, zero-U (does not consume rack U-space) |
| Example Models | APC AP8681, Raritan PX3-5902V, ServerTech PRO2 |
Circuit Sizing
| Feed | Circuit | Capacity | Derated (80%) | Serves |
|---|---|---|---|---|
| A | 208V / 30A single-phase | 6.2 kVA | 5.0 kVA | PDU-A |
| B | 208V / 30A single-phase | 6.2 kVA | 5.0 kVA | PDU-B |
Each feed must independently support the full rack load. Under 2N, if Feed A fails, Feed B carries 100% of the load.
UPS Sizing
Per-Rack UPS (Distributed Model)
| Specification | Minimum | Recommended |
|---|---|---|
| Capacity | 6 kVA / 5.4 kW | 10 kVA / 9 kW |
| Runtime at Full Load | 5 minutes | 15 minutes |
| Battery Type | VRLA (sealed lead-acid) | Lithium-ion |
| Transfer Time | < 5 ms | < 2 ms (online double-conversion) |
| Topology | Line-interactive | Online double-conversion |
| Example Models | APC SMT3000RM2U | APC SRT6KRMXLI, Eaton 9PX |
Centralized UPS (Room-Level)
For data centers with centralized UPS infrastructure, ensure:
- Each power feed (A and B) is backed by an independent UPS system.
- UPS runtime covers the gap between utility failure and generator start (typically 10–30 seconds for diesel generators, but plan for 15 minutes to cover generator failure-to-start scenarios).
- Automatic Transfer Switch (ATS) is tested quarterly.
Generator Requirements
For sites requiring extended outage protection:
| Specification | Value |
|---|---|
| Fuel Type | Diesel (most common) or natural gas |
| Start Time | < 15 seconds from utility loss |
| Runtime | 24–72 hours at full load (fuel dependent) |
| Testing | Monthly no-load test, annual full-load test |
Cooling Requirements
Heat Dissipation
All electrical power consumed by servers converts to heat. Use the rack power budget to calculate cooling load.
| Metric | Value |
|---|---|
| Rack heat output (typical) | 3,080 W = 10,510 BTU/hr |
| Rack heat output (peak) | 4,550 W = 15,525 BTU/hr |
| Conversion | 1 W = 3.412 BTU/hr |
Cooling Configuration
| Specification | Minimum | Recommended |
|---|---|---|
| Inlet Air Temperature | 18–27°C (64–80°F) per ASHRAE A1 | 20–25°C (68–77°F) |
| Humidity | 20–80% RH (non-condensing) | 40–60% RH |
| Airflow | Hot aisle / cold aisle separation | Contained hot aisle or cold aisle |
| Cooling Capacity per Rack | 5 kW | 8–10 kW |
| Redundancy | N+1 CRAC/CRAH units | N+1 with automatic failover |
Airflow Management
- Install blanking panels in all unused U-spaces to prevent hot air recirculation.
- Route cables through overhead trays or under-floor pathways to avoid obstructing airflow.
- Monitor inlet temperature per rack with sensors feeding into the monitoring stack (SNMP → Prometheus).
- Set alerting thresholds: warn at 27°C inlet, critical at 32°C inlet.
Monitoring and Alerting
| Metric | Source | Alert Threshold |
|---|---|---|
| PDU load (amps per phase) | PDU SNMP | > 80% of circuit rating |
| UPS battery charge | UPS SNMP | < 80% charge |
| UPS on battery | UPS SNMP | Any event (immediate alert) |
| Inlet temperature | Rack sensor | > 27°C warn, > 32°C critical |
| Humidity | Room sensor | < 20% or > 80% RH |
Feed these into Prometheus via SNMP Exporter and create Grafana dashboards for the facilities team.
Considerations
- Power efficiency: Select 80+ Titanium PSUs (96% efficiency at 50% load) to reduce waste heat and operating cost.
- Phase balancing: Distribute server power supplies across phases evenly to avoid phase imbalance on three-phase circuits.
- Cable management: Use color-coded power cables (red for Feed A, blue for Feed B) to prevent accidental disconnection of the wrong feed. See Cabling Standards.
- Maintenance windows: UPS battery replacement and PDU firmware updates require planned maintenance. Schedule during low-traffic periods and verify 2N redundancy before taking either feed offline.
- Capacity planning: Track actual power draw per rack monthly. If any rack exceeds 70% of circuit capacity, plan expansion before hitting the 80% derate limit.