Skip to main content

Reference Architecture: Power & Cooling

Purpose: For platform engineers, provides power distribution, redundancy (N+1, 2N), UPS sizing, and cooling requirements.

Overview

A Kubernetes cluster is only as reliable as its power and cooling infrastructure. This document covers power distribution, UPS sizing, and thermal management for the physical hosts described in Physical Compute. All calculations assume a single-rack or multi-rack deployment in an enterprise data center with existing utility power.

Power Redundancy Models

ModelDescriptionUse Case
NSingle power feed, no redundancyLab / development only
N+1One additional power unit beyond minimumStandard production
2NFully duplicated power path (A+B feeds)Mission-critical production
2N+1Dual path plus one spare unitHighest availability

openCenter production deployments require 2N redundancy at minimum. Each server connects to two independent power feeds (A and B) via dual power supplies.

Power Budget per Rack

Estimate power draw based on server configuration. These values assume the hardware from Physical Compute.

ComponentQuantity per RackWatts (Typical)Watts (Peak)
Hypervisor Host (2× Xeon Gold, 512 GB)4600 W900 W
Management Host1350 W500 W
ToR Switch (leaf)2150 W200 W
OOB Management Switch130 W50 W
Patch Panel / Cable Management0 W0 W
Rack Total3,080 W4,550 W

Add 20% headroom for future growth: plan circuits for approximately 5.5 kW peak per rack.

Power Distribution

PDU Configuration (2N)

Each rack requires two PDUs, one per power feed:

SpecificationValue
PDU TypeMetered or Switched (per-outlet monitoring preferred)
InputSingle-phase 208V/30A (L6-30P) or three-phase 208V/20A
Outlets24–42× C13/C19
MonitoringSNMP v3, per-outlet current/power readings
MountingVertical, zero-U (does not consume rack U-space)
Example ModelsAPC AP8681, Raritan PX3-5902V, ServerTech PRO2

Circuit Sizing

FeedCircuitCapacityDerated (80%)Serves
A208V / 30A single-phase6.2 kVA5.0 kVAPDU-A
B208V / 30A single-phase6.2 kVA5.0 kVAPDU-B

Each feed must independently support the full rack load. Under 2N, if Feed A fails, Feed B carries 100% of the load.

UPS Sizing

Per-Rack UPS (Distributed Model)

SpecificationMinimumRecommended
Capacity6 kVA / 5.4 kW10 kVA / 9 kW
Runtime at Full Load5 minutes15 minutes
Battery TypeVRLA (sealed lead-acid)Lithium-ion
Transfer Time< 5 ms< 2 ms (online double-conversion)
TopologyLine-interactiveOnline double-conversion
Example ModelsAPC SMT3000RM2UAPC SRT6KRMXLI, Eaton 9PX

Centralized UPS (Room-Level)

For data centers with centralized UPS infrastructure, ensure:

  • Each power feed (A and B) is backed by an independent UPS system.
  • UPS runtime covers the gap between utility failure and generator start (typically 10–30 seconds for diesel generators, but plan for 15 minutes to cover generator failure-to-start scenarios).
  • Automatic Transfer Switch (ATS) is tested quarterly.

Generator Requirements

For sites requiring extended outage protection:

SpecificationValue
Fuel TypeDiesel (most common) or natural gas
Start Time< 15 seconds from utility loss
Runtime24–72 hours at full load (fuel dependent)
TestingMonthly no-load test, annual full-load test

Cooling Requirements

Heat Dissipation

All electrical power consumed by servers converts to heat. Use the rack power budget to calculate cooling load.

MetricValue
Rack heat output (typical)3,080 W = 10,510 BTU/hr
Rack heat output (peak)4,550 W = 15,525 BTU/hr
Conversion1 W = 3.412 BTU/hr

Cooling Configuration

SpecificationMinimumRecommended
Inlet Air Temperature18–27°C (64–80°F) per ASHRAE A120–25°C (68–77°F)
Humidity20–80% RH (non-condensing)40–60% RH
AirflowHot aisle / cold aisle separationContained hot aisle or cold aisle
Cooling Capacity per Rack5 kW8–10 kW
RedundancyN+1 CRAC/CRAH unitsN+1 with automatic failover

Airflow Management

  • Install blanking panels in all unused U-spaces to prevent hot air recirculation.
  • Route cables through overhead trays or under-floor pathways to avoid obstructing airflow.
  • Monitor inlet temperature per rack with sensors feeding into the monitoring stack (SNMP → Prometheus).
  • Set alerting thresholds: warn at 27°C inlet, critical at 32°C inlet.

Monitoring and Alerting

MetricSourceAlert Threshold
PDU load (amps per phase)PDU SNMP> 80% of circuit rating
UPS battery chargeUPS SNMP< 80% charge
UPS on batteryUPS SNMPAny event (immediate alert)
Inlet temperatureRack sensor> 27°C warn, > 32°C critical
HumidityRoom sensor< 20% or > 80% RH

Feed these into Prometheus via SNMP Exporter and create Grafana dashboards for the facilities team.

Considerations

  • Power efficiency: Select 80+ Titanium PSUs (96% efficiency at 50% load) to reduce waste heat and operating cost.
  • Phase balancing: Distribute server power supplies across phases evenly to avoid phase imbalance on three-phase circuits.
  • Cable management: Use color-coded power cables (red for Feed A, blue for Feed B) to prevent accidental disconnection of the wrong feed. See Cabling Standards.
  • Maintenance windows: UPS battery replacement and PDU firmware updates require planned maintenance. Schedule during low-traffic periods and verify 2N redundancy before taking either feed offline.
  • Capacity planning: Track actual power draw per rack monthly. If any rack exceeds 70% of circuit capacity, plan expansion before hitting the 80% derate limit.