Reference Architecture: Infrastructure Services
Purpose: For platform engineers, provides specifications for shared infrastructure services required before Kubernetes deployment.
Overview
Kubernetes clusters depend on several infrastructure services that must be operational before cluster provisioning begins. DNS resolution failures, NTP drift, or DHCP exhaustion cause cluster instability that is difficult to diagnose after the fact. This document defines the requirements for each service.
Service Dependency Order
Provision these services in order. Each service depends on the ones above it.
- NTP — Time synchronization (no dependencies)
- DNS — Name resolution (depends on NTP for DNSSEC)
- DHCP / IPAM — IP address assignment (depends on DNS for registration)
- LDAP / Active Directory — Identity (depends on DNS, NTP)
- Certificate Authority — TLS certificates (depends on NTP, DNS)
- HTTP Proxy / Mirror — Package repositories (depends on DNS)
NTP (Network Time Protocol)
Time skew greater than 500 ms between nodes causes etcd leader election failures, TLS certificate validation errors, and log correlation problems.
Requirements
| Specification | Value |
|---|---|
| Protocol | NTP (UDP 123) or Chrony |
| Servers | 2 internal NTP servers minimum (for redundancy) |
| Stratum | Stratum 2 or better (synced to public Stratum 1 or GPS) |
| Max Skew | < 100 ms between any two cluster nodes |
| Client Config | chrony (preferred on Ubuntu/RHEL) or systemd-timesyncd |
Configuration Targets
| Component | NTP Source |
|---|---|
| ESXi / KVM hosts | Internal NTP servers |
| vCenter Server | Internal NTP servers |
| Kubernetes VMs | Internal NTP servers (not host-level sync) |
| Network switches | Internal NTP servers |
| Storage arrays | Internal NTP servers |
Configure Kubernetes VMs to use NTP directly, not VMware Tools time sync. VMware Tools time sync can cause time jumps that confuse etcd.
Verification
# On each node
chronyc tracking
# Verify "System time" offset is < 100 ms
# Verify "Leap status" is "Normal"
DNS (Domain Name System)
Kubernetes requires forward and reverse DNS resolution for node hostnames. The API server, etcd, and kubelet all use DNS for peer discovery and certificate validation.
Requirements
| Specification | Value |
|---|---|
| Servers | 2 DNS servers minimum (primary + secondary) |
| Software | BIND 9, Unbound, Windows DNS, or InfoBlox |
| Zones | Forward zone for cluster domain, reverse zone for node subnets |
| Record Types | A, PTR, CNAME, SRV |
| TTL | 300 seconds (5 min) for cluster records |
| DNSSEC | Optional (requires accurate NTP) |
Required DNS Records
Create these records before running opencenter cluster setup:
| Record | Type | Example | Purpose |
|---|---|---|---|
| Control plane nodes | A | cp01.k8s.example.com → 10.0.40.10 | Node identity |
| Worker nodes | A | wk01.k8s.example.com → 10.0.40.20 | Node identity |
| API server VIP | A | api.k8s.example.com → 10.0.40.100 | kubectl, CI/CD access |
| Wildcard ingress | A / CNAME | *.apps.k8s.example.com → 10.0.40.101 | Application ingress |
| Reverse PTR | PTR | 10.0.40.10 → cp01.k8s.example.com | Reverse lookup (audit logs) |
| vCenter | A | vcenter.example.com → 10.0.10.5 | vSphere management |
| Bastion | A | bastion.k8s.example.com → 10.0.40.200 | SSH jump host |
Internal vs. External DNS
| Zone | Scope | Resolver |
|---|---|---|
k8s.example.com | Internal (data center) | Internal DNS servers |
cluster.local | Kubernetes internal | CoreDNS (deployed by Kubespray) |
| External domains | Internet resolution | Internal DNS with forwarders to upstream |
CoreDNS inside the cluster handles cluster.local service discovery. It forwards all other queries to the infrastructure DNS servers configured in each node's /etc/resolv.conf.
DHCP and IPAM
Static vs. DHCP
| Component | IP Assignment | Reason |
|---|---|---|
| Control plane nodes | Static | Stable IPs for etcd, API server certificates |
| Worker nodes | Static or DHCP reservation | Predictable addressing for DNS records |
| Bastion | Static | Known SSH target |
| BMC/IPMI | Static or DHCP reservation | Out-of-band access |
| Pod network | Calico-managed (IPAM) | Cluster-internal, not infrastructure DHCP |
Static IPs are preferred for all Kubernetes nodes. If using DHCP, use MAC-based reservations to ensure consistent IP assignment.
IPAM Planning
| Subnet | CIDR | Usable IPs | Assignment |
|---|---|---|---|
| Management | 10.0.10.0/24 | 254 | ESXi hosts, vCenter, switches |
| VM Network | 10.0.40.0/24 | 254 | Kubernetes nodes, bastion |
| Pod CIDR | 10.244.0.0/16 | 65,534 | Calico IPAM (per-node /24 blocks) |
| Service CIDR | 10.96.0.0/12 | 1,048,574 | Kubernetes service ClusterIPs |
| MetalLB Pool | 10.0.40.100–10.0.40.120 | 21 | LoadBalancer service IPs |
Reserve the MetalLB IP range in IPAM to prevent conflicts. These IPs must not be assigned to any other device.
LDAP / Active Directory
Keycloak (deployed by openCenter) integrates with LDAP or Active Directory for user authentication. The infrastructure team must provide:
| Requirement | Value |
|---|---|
| Protocol | LDAPS (TCP 636) — LDAP over TLS |
| Server | 2 domain controllers minimum |
| Base DN | dc=example,dc=com |
| Bind Account | Service account with read-only access to user/group OUs |
| User Search Base | ou=Users,dc=example,dc=com |
| Group Search Base | ou=Groups,dc=example,dc=com |
| Group for Cluster Admins | cn=k8s-admins,ou=Groups,dc=example,dc=com |
| Group for Viewers | cn=k8s-viewers,ou=Groups,dc=example,dc=com |
Keycloak maps LDAP groups to Kubernetes RBAC roles via RBAC Manager. See the Keycloak and RBAC Manager platform service documentation for configuration details.
Certificate Authority (CA)
TLS certificates are required for the Kubernetes API server, etcd, ingress, and platform services. cert-manager (deployed by openCenter) automates certificate issuance.
Options
| CA Type | Use Case | Integration |
|---|---|---|
| Internal CA (enterprise PKI) | Regulated environments | cert-manager ACME or CA issuer |
| Self-signed CA | Lab / development | cert-manager self-signed issuer |
| Let's Encrypt | Internet-facing clusters | cert-manager ACME issuer |
Requirements for Internal CA
| Requirement | Value |
|---|---|
| Protocol | ACME (preferred) or manual CSR signing |
| Root CA | Trusted by all cluster nodes (installed in OS trust store) |
| Intermediate CA | Dedicated intermediate for Kubernetes certificates |
| Key Algorithm | RSA 2048+ or ECDSA P-256 |
| Certificate Lifetime | 90 days (auto-renewed by cert-manager) |
| CRL / OCSP | Accessible from cluster nodes |
Distribute the root CA certificate to all Kubernetes nodes during provisioning (via Kubespray extra_certs variable or cloud-init).
HTTP Proxy / Package Mirror
For clusters with restricted internet access (not fully air-gapped):
| Service | Purpose | Software |
|---|---|---|
| HTTP Proxy | Route outbound traffic through a controlled egress point | Squid, Zscaler |
| APT/YUM Mirror | Cache OS packages locally | Aptly, Pulp, Nexus |
| Container Registry Mirror | Cache container images | Harbor (deployed by openCenter), Nexus |
Proxy Configuration
If using an HTTP proxy, configure these environment variables on all Kubernetes nodes:
| Variable | Value |
|---|---|
http_proxy | http://proxy.example.com:3128 |
https_proxy | http://proxy.example.com:3128 |
no_proxy | 10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,.example.com,.cluster.local,.svc |
The no_proxy list must include all internal CIDRs, the cluster domain, and the Kubernetes service domain. Missing entries cause internal traffic to route through the proxy and fail.
For fully air-gapped deployments, use openCenter-AirGap instead of a proxy. See the air-gap documentation.
Considerations
- Service availability: DNS and NTP are single points of failure for the entire cluster. Deploy at least two instances of each, on separate hosts, in separate failure domains.
- Monitoring: Monitor DNS query latency, NTP offset, and DHCP lease utilization. Feed metrics into Prometheus via exporters (bind_exporter, chrony_exporter, dhcp_exporter).
- Change management: DNS record changes and IPAM updates affect cluster operations. Use a change management process (ticket + approval) for production DNS zones.
- Firewall rules: Ensure all Kubernetes nodes can reach DNS (TCP/UDP 53), NTP (UDP 123), LDAPS (TCP 636), and the HTTP proxy (TCP 3128) from the VM network VLAN.
- Documentation: Maintain a network services runbook listing server IPs, credentials (in a vault), and escalation contacts for each infrastructure service.