Skip to main content

Kubernetes Deployment

Purpose: For platform engineers, explains how Kubespray deploys Kubernetes with HA control plane, Calico CNI, and security hardening.

Concept Summary

Kubespray is an Ansible-based installer that deploys production-grade Kubernetes clusters. openCenter uses Kubespray with a hardened configuration generated by opencenter cluster setup. The result is a cluster with HA control plane, Calico networking, containerd runtime, and security controls enabled by default.

How Kubespray Fits the Workflow

After infrastructure provisioning (Terraform creates VMs), Kubespray runs against the generated inventory to install Kubernetes:

Terraform (VMs) → Kubespray (K8s install) → FluxCD (GitOps bootstrap)

Kubespray connects to nodes via SSH from the bastion (or build host) and executes Ansible playbooks that install containerd, kubeadm, kubelet, and the control plane components.

Cluster Architecture

A typical openCenter cluster:

RoleCountComponents
Control plane3kube-apiserver, kube-controller-manager, kube-scheduler, etcd
Worker3+kubelet, kube-proxy, containerd
Bastion1Ansible runner, local registry (air-gap only)

The three control plane nodes provide high availability. etcd runs on the control plane nodes in a stacked topology (co-located with the API server).

Security Hardening

opencenter cluster setup generates group_vars/k8s_hardening.yml with these controls:

Pod Security Admission (PSA)

kube_apiserver_enable_admission_plugins:
- PodSecurity
- EventRateLimit
- AlwaysPullImages

kube_pod_security_default_enforce: baseline
kube_pod_security_default_audit: restricted
kube_pod_security_default_warn: restricted
  • baseline enforcement blocks known-dangerous pod configurations (privileged containers, host namespaces).
  • restricted audit and warn flags violations without blocking, giving teams time to remediate.

Encryption at Rest

kube_encrypt_secret_data: true

Secrets stored in etcd are encrypted using AES-CBC with a key managed by the API server.

Audit Logging

kubernetes_audit: true
audit_log_maxage: 30
audit_log_maxbackup: 10
audit_log_maxsize: 100

API server audit logs capture who did what and when. Logs rotate at 100 MB with 30-day retention.

Additional Admission Controllers

  • EventRateLimit — prevents event flooding from misbehaving controllers.
  • AlwaysPullImages — forces image pulls on every pod start, preventing use of cached images that may have been tampered with.

Container Runtime

SettingValue
Runtimecontainerd
Version2.1.5 (from versions.env)
OCI runtimerunc 1.3.4

Kubespray configures containerd with registry mirrors pointing to the bastion in air-gap deployments.

Networking (Calico CNI)

SettingValue
CNI pluginCalico v3.31.3
EncapsulationVXLAN (default) or BGP
Pod CIDR10.233.64.0/18 (default)
Service CIDR10.233.0.0/18 (default)

Calico provides network policy enforcement at the pod level. NetworkPolicies defined in openCenter-gitops-base are enforced by Calico's dataplane.

Running Kubespray

Kubespray runs automatically as part of the Terraform provisioner in most setups. For manual execution:

cd infrastructure/clusters/<cluster-name>/

# Activate the Python venv (air-gap: from bundled wheels)
source /opt/opencenter/venv/bin/activate

# Run the cluster playbook
ansible-playbook -i inventory/inventory.yaml \
/opt/opencenter/kubespray/cluster.yml \
--become

Deployment takes 15–30 minutes depending on node count and network speed.

Verification

export KUBECONFIG=infrastructure/clusters/<cluster-name>/kubeconfig.yaml

# Nodes ready
kubectl get nodes -o wide

# System pods running
kubectl get pods -n kube-system

# Calico running
kubectl get pods -n calico-system

# PSA labels applied
kubectl get ns default --show-labels | grep pod-security

Trade-offs

  • Kubespray is Ansible-based, so it requires SSH access to all nodes. In air-gap environments, the bastion serves as the Ansible control node.
  • The stacked etcd topology (etcd on control plane nodes) simplifies operations but means losing a control plane node also loses an etcd member. Three control plane nodes tolerate one failure.
  • Security hardening defaults are opinionated. Teams can override settings in group_vars/ for specific compliance requirements, but weakening controls should be documented and approved.

Further Reading