Edit

Purpose: For platform engineers, explains the design decisions behind the opencenter-windows collection, how Windows nodes fit into the openCenter ecosystem, and the trade-offs involved.

How Windows workers fit into openCenter

openCenter clusters are Linux-first. The control plane, FluxCD, platform services (Kyverno, cert-manager, Prometheus, etc.) all run on Linux nodes. Windows worker nodes are added when customers need to run Windows-native workloads — .NET Framework applications, IIS, Windows containers, or legacy services that can’t be containerized on Linux.

Windows nodes join as workers only. Kubernetes does not support Windows control plane nodes. The win-kubeadm role delegates to a Linux control plane node to generate join tokens and create the kube-dns Service, which reflects this asymmetry.

Why ContainerD (not Docker)

Kubernetes deprecated Docker as a container runtime in v1.20 and removed dockershim in v1.24. ContainerD is the standard CRI-compliant runtime for both Linux and Windows. The collection installs ContainerD directly from GitHub releases rather than using Docker Desktop or any wrapper.

The win-containerd role generates a default config.toml and patches only the CNI paths. This keeps the configuration close to upstream defaults while allowing the CNI plugin (Calico, Flannel) to find its binaries and config in known locations.

Why NSSM for kubelet

On Linux, kubelet runs as a systemd unit. Windows has no systemd equivalent. The collection uses NSSM (Non-Sucking Service Manager) to wrap kubelet as a Windows service because:

  • NSSM handles stdout/stderr log capture and rotation natively

  • It supports service dependencies (kubelet depends on containerd)

  • It restarts the process on failure without custom recovery scripts

  • It’s a single static binary with no installer or dependencies

The kubelet isn’t started directly. Instead, NSSM runs StartKubelet.ps1, a PowerShell script that reads kubeadm-flags.env (written by kubeadm join) and constructs the full kubelet command line. This indirection exists because kubeadm writes runtime flags that aren’t known until join time.

Why BGP routing features are installed

The win-kubeadm role installs RemoteAccess, RSAT-RemoteAccess-PowerShell, and Routing Windows features, then configures Install-RemoteAccess -VpnType RoutingOnly. This enables the Windows BGP router, which Calico uses for pod-to-pod networking across nodes.

Without BGP routing, Calico can’t advertise pod CIDR routes from Windows nodes. If you’re using a different CNI that doesn’t rely on BGP (e.g., overlay-mode Flannel), these features are still installed but remain idle — they don’t interfere with other networking modes.

The Hyper-V question

Windows containers require the Hyper-V platform for process isolation (the default isolation mode on Windows Server). The collection installs Hyper-V features by default. When running Windows Server as a VM on a hypervisor that doesn’t expose nested virtualization (common on OpenStack, some VMware configurations), the standard Hyper-V Windows feature fails to install.

The skip_hypervisor_support_check flag switches to a DISM-based approach that enables a minimal Microsoft-Hyper-V feature set sufficient for process-isolated containers without requiring full nested virtualization. It also disables Microsoft-Hyper-V-Online, which isn’t needed for container workloads and can cause issues in nested environments.

This is a pragmatic workaround. Hyper-V isolated containers (running each container in a lightweight VM) won’t work without full nested virtualization, but process-isolated containers — the common case for Kubernetes workloads — work fine.

Reboot handling

Windows feature installation frequently requires reboots. The collection handles this in two ways:

  1. Pre-flight check: before any work begins, the win-containerd role checks registry keys for pending reboots and handles them first. This prevents failures from stale state.

  2. Post-install reboots: after feature installation, handlers set a reboot_required flag. The role flushes handlers and reboots if the flag is set, with a 600-second timeout waiting for SSH to come back.

This means a full run from scratch typically involves 1–2 reboots (features in win-containerd, RemoteAccess features in win-kubeadm). Subsequent runs skip feature installation and don’t reboot.

Separation from openCenter-cli

The openCenter-cli generates infrastructure, Kubespray inventory, and FluxCD manifests for Linux clusters. Windows node management is deliberately separate because:

  • Not all clusters need Windows nodes

  • Windows node lifecycle (patching, reboots, feature updates) differs from Linux

  • The Ansible execution model (push-based, imperative) suits Windows better than the GitOps model used for Linux platform services

  • Windows nodes are added after the cluster is operational, not during initial provisioning

The connection point is the oc_controlplane_nodes inventory group — the same control plane nodes provisioned by openCenter-cli and Kubespray.

What the collection doesn’t do

  • CNI plugin installation (Calico DaemonSet, Flannel, etc.) — this is handled by FluxCD from openCenter-gitops-base or manually

  • kube-proxy configuration — must be applied separately for Windows

  • Windows OS patching or updates

  • Certificate rotation — handled by kubeadm and kubelet auto-renewal

  • Node draining or graceful removal — use kubectl drain and kubectl delete node