Edit

Purpose: For VMware users, shows how to deploy a Kubernetes cluster on pre-provisioned VMware VMs, covering prerequisites through validation.

By the end of this tutorial, you’ll have a production Kubernetes cluster running on VMware vSphere with platform services deployed via GitOps.

Time: 60-75 minutes (including VM provisioning)

What You’ll Build

A production Kubernetes cluster with:

  • 3 control plane nodes (high availability)

  • 3 worker nodes (production capacity)

  • Calico CNI networking

  • vSphere CSI storage driver

  • MetalLB load balancer

  • 20+ platform services (cert-manager, Keycloak, monitoring, etc.)

  • FluxCD GitOps continuous delivery

Prerequisites

Before starting, ensure you have:

VMware Infrastructure:

  • VMware vSphere 7.0+ (vCenter Server)

  • vSphere credentials (username, password)

  • Datacenter, cluster, and datastore configured

  • VM network with DHCP or static IPs

  • 6 VMs provisioned (3 masters, 3 workers) with Ubuntu 24.04

  • SSH access to all VMs

VM Requirements (per node):

  • Control Plane: 4 vCPU, 8 GB RAM, 40 GB disk

  • Worker: 4 vCPU, 16 GB RAM, 80 GB disk

  • OS: Ubuntu 24.04 LTS

  • Network: Static IP or DHCP reservation

  • SSH: SSH key authentication enabled

Local Tools:

  • openCenter CLI installed

  • Git installed

  • SSH client

  • Text editor

Verify vSphere Access:

# Test SSH access to VMs
ssh ubuntu@<master-1-ip>
ssh ubuntu@<master-2-ip>
ssh ubuntu@<master-3-ip>
ssh ubuntu@<worker-1-ip>
ssh ubuntu@<worker-2-ip>
ssh ubuntu@<worker-3-ip>

If SSH works for all VMs, you’re ready to proceed.

Step 1: Provision VMs in vSphere

Provision 6 VMs using vSphere UI or Terraform:

Option A: vSphere UI

  1. Open vSphere Client

  2. Right-click datacenter → New Virtual Machine

  3. Create 6 VMs with specifications above

  4. Install Ubuntu 24.04 on each VM

  5. Configure network (static IPs recommended)

  6. Enable SSH key authentication

Option B: Terraform (Recommended)

# vmware-vms.tf
terraform {
  required_providers {
    vsphere = {
      source  = "hashicorp/vsphere"
      version = "~> 2.0"
    }
  }
}

provider "vsphere" {
  user           = var.vsphere_user
  password       = var.vsphere_password
  vsphere_server = var.vsphere_server
  allow_unverified_ssl = true
}

data "vsphere_datacenter" "dc" {
  name = "DC1"
}

data "vsphere_datastore" "datastore" {
  name          = "datastore1"
  datacenter_id = data.vsphere_datacenter.dc.id
}

data "vsphere_network" "network" {
  name          = "VM Network"
  datacenter_id = data.vsphere_datacenter.dc.id
}

data "vsphere_compute_cluster" "cluster" {
  name          = "Cluster1"
  datacenter_id = data.vsphere_datacenter.dc.id
}

data "vsphere_virtual_machine" "template" {
  name          = "ubuntu-24.04-template"
  datacenter_id = data.vsphere_datacenter.dc.id
}

# Control plane nodes
resource "vsphere_virtual_machine" "master" {
  count            = 3
  name             = "prod-cluster-master-${count.index + 1}"
  resource_pool_id = data.vsphere_compute_cluster.cluster.resource_pool_id
  datastore_id     = data.vsphere_datastore.datastore.id

  num_cpus = 4
  memory   = 8192

  network_interface {
    network_id = data.vsphere_network.network.id
  }

  disk {
    label = "disk0"
    size  = 40
  }

  clone {
    template_uuid = data.vsphere_virtual_machine.template.id
  }
}

# Worker nodes
resource "vsphere_virtual_machine" "worker" {
  count            = 3
  name             = "prod-cluster-worker-${count.index + 1}"
  resource_pool_id = data.vsphere_compute_cluster.cluster.resource_pool_id
  datastore_id     = data.vsphere_datastore.datastore.id

  num_cpus = 4
  memory   = 16384

  network_interface {
    network_id = data.vsphere_network.network.id
  }

  disk {
    label = "disk0"
    size  = 80
  }

  clone {
    template_uuid = data.vsphere_virtual_machine.template.id
  }
}
# Deploy VMs
terraform init
terraform apply

# Get VM IPs
terraform output

Record VM IPs:

Master 1: 192.168.1.10
Master 2: 192.168.1.11
Master 3: 192.168.1.12
Worker 1: 192.168.1.20
Worker 2: 192.168.1.21
Worker 3: 192.168.1.22

Step 2: Initialize Cluster Configuration

Create a new cluster configuration with VMware defaults:

opencenter cluster init prod-cluster \
  --org my-company \
  --type vmware

What happens:

  • Creates configuration file at ~/.config/opencenter/clusters/my-company/.prod-cluster-config.yaml

  • Applies VMware defaults (pre-provisioned VMs, vSphere CSI)

  • Generates SSH keys for cluster access

  • Generates SOPS Age keys for secrets encryption

Output:

✓ Created cluster configuration: prod-cluster
✓ Generated SSH keys: ~/.config/opencenter/clusters/my-company/secrets/ssh/prod-cluster-key
✓ Generated SOPS Age keys: ~/.config/opencenter/clusters/my-company/secrets/age/prod-cluster-key.txt

Configuration file: ~/.config/opencenter/clusters/my-company/.prod-cluster-config.yaml

Next steps:
1. Edit configuration file to add VM IPs and vSphere credentials
2. Validate configuration: opencenter cluster validate prod-cluster
3. Generate GitOps repository: opencenter cluster generate prod-cluster

Step 3: Configure VM Inventory

Edit the configuration file to add VM IPs:

opencenter cluster edit prod-cluster

Update the VMware section:

opencenter:
  infrastructure:
    provider: vmware
    vmware:
      # vSphere credentials (for CSI driver)
      vcenter_server: "vcenter.example.com"
      username: "administrator@vsphere.local"
      password: "your-password"  # Will be encrypted with SOPS
      datacenter: "DC1"
      datastore: "datastore1"

      # VM inventory (pre-provisioned VMs)
      masters:
        - hostname: prod-cluster-master-1
          ip: 192.168.1.10
          ssh_user: ubuntu
        - hostname: prod-cluster-master-2
          ip: 192.168.1.11
          ssh_user: ubuntu
        - hostname: prod-cluster-master-3
          ip: 192.168.1.12
          ssh_user: ubuntu

      workers:
        - hostname: prod-cluster-worker-1
          ip: 192.168.1.20
          ssh_user: ubuntu
        - hostname: prod-cluster-worker-2
          ip: 192.168.1.21
          ssh_user: ubuntu
        - hostname: prod-cluster-worker-3
          ip: 192.168.1.22
          ssh_user: ubuntu

Step 4: Configure Cluster Settings

Configure Kubernetes and networking:

opencenter:
  meta:
    name: prod-cluster
    environment: production
    region: on-premises
    organization: my-company

  cluster:
    # Kubernetes version
    kubernetes:
      version: "1.33.5"

    # Node counts (must match VM inventory)
    master_count: 3
    worker_count: 3

    # Networking
    networking:
      pod_subnet: "10.42.0.0/16"
      service_subnet: "10.43.0.0/16"
      cni_plugin: calico

      # Load balancer (MetalLB for on-premises)
      metallb:
        enabled: true
        ip_range: "192.168.1.100-192.168.1.150"  # Available IPs in your network

    # Storage (vSphere CSI)
    storage:
      default_storage_class: "vsphere-csi-sc"

MetalLB IP range:

  • Choose IPs from your VM network that are not used by DHCP

  • Ensure IPs are routable from your network

  • Reserve at least 10-20 IPs for load balancer services

Step 5: Configure Platform Services

Configure platform services:

opencenter:
  services:
    # Core services
    cert-manager:
      enabled: true

    keycloak:
      enabled: true
      hostname: "auth.my-company.prod-cluster.example.com"
      admin_password: "change-me-in-production"  # Will be encrypted

    kube-prometheus-stack:
      enabled: true
      grafana_admin_password: "change-me-in-production"  # Will be encrypted

    loki:
      enabled: true
      retention_days: 30

    velero:
      enabled: true
      s3_bucket: "prod-cluster-backups"
      s3_endpoint: "s3.example.com"  # Your S3-compatible storage

    # vSphere CSI driver
    vsphere-csi:
      enabled: true
      vcenter_server: "vcenter.example.com"
      datacenter: "DC1"
      datastore: "datastore1"

    # Optional services
    harbor:
      enabled: true
      hostname: "harbor.my-company.prod-cluster.example.com"
      admin_password: "change-me-in-production"  # Will be encrypted

    headlamp:
      enabled: true
      hostname: "dashboard.my-company.prod-cluster.example.com"

Step 6: Validate Configuration

Validate your configuration:

opencenter cluster validate prod-cluster

What’s validated:

  1. Schema compliance (structure, types, formats)

  2. Business rules (cross-field dependencies)

  3. VMware constraints (VM inventory, vSphere credentials)

  4. SSH connectivity (can reach all VMs)

Expected output:

✓ Schema validation passed
✓ Business rules validation passed
✓ VMware validation passed
  - All VMs are reachable via SSH
  - vSphere credentials are valid
  - Datacenter and datastore exist
  - Node count matches VM inventory (6 VMs)

Configuration is valid and ready for deployment.

If validation fails:

  • Verify SSH access to all VMs

  • Verify vSphere credentials

  • Verify VM IPs are correct

  • See #troubleshooting[Troubleshooting] section below

Step 7: Generate GitOps Repository

Generate the complete GitOps repository structure:

opencenter cluster generate prod-cluster

What’s generated:

~/prod-cluster-gitops/
├── .gitignore
├── .sops.yaml
├── README.md
│
├── applications/
│   └── overlays/prod-cluster/
│       ├── flux-system/           # FluxCD bootstrap
│       ├── services/              # Platform services
│       └── managed-services/      # Customer applications
│
└── infrastructure/
    └── clusters/prod-cluster/
        ├── inventory/             # Kubespray Ansible inventory
        │   ├── inventory.yaml     # VM inventory
        │   ├── group_vars/        # Ansible variables
        │   └── credentials/       # Encrypted credentials
        └── kubeconfig.yaml        # Generated after deployment

Output:

✓ Generated GitOps repository: ~/prod-cluster-gitops
✓ Encrypted secrets with SOPS
✓ Created Kubespray inventory
✓ Created FluxCD manifests

Next steps:
1. Review generated files
2. Initialize Git repository: cd ~/prod-cluster-gitops && git init
3. Commit files: git add . && git commit -m "Initial cluster configuration"
4. Push to Git: git remote add origin <your-repo-url> && git push -u origin main
5. Bootstrap cluster: opencenter cluster deploy prod-cluster

Step 8: Initialize Git Repository

Initialize and push to Git:

# Navigate to GitOps repository
cd ~/prod-cluster-gitops

# Initialize Git
git init

# Add all files
git add .

# Commit
git commit -m "Initial prod-cluster configuration"

# Add remote
git remote add origin git@github.com:my-company/prod-cluster-gitops.git

# Push to remote
git push -u origin main

Step 9: Bootstrap Cluster

Deploy Kubernetes to the VMs (this takes 20-30 minutes):

opencenter cluster deploy prod-cluster

What happens:

Phase 1: VM Preparation (5-10 minutes)
  ✓ Verifying SSH access to all VMs
  ✓ Installing dependencies (Python, Docker, etc.)
  ✓ Configuring OS settings (swap, kernel modules)
  ✓ Configuring firewall rules

Phase 2: Kubernetes Deployment (10-15 minutes)
  ✓ Configuring control plane nodes
  ✓ Deploying etcd cluster
  ✓ Deploying Kubernetes API server
  ✓ Deploying Kubernetes controllers
  ✓ Joining worker nodes
  ✓ Installing Calico CNI
  ✓ Installing vSphere CSI driver
  ✓ Installing MetalLB

Phase 3: GitOps Bootstrap (5-10 minutes)
  ✓ Installing FluxCD controllers
  ✓ Creating GitRepository sources
  ✓ Deploying platform services
  ✓ Waiting for services to be ready

Cluster is ready!

Monitor progress:

# In another terminal, watch Ansible progress
tail -f ~/prod-cluster-gitops/infrastructure/clusters/prod-cluster/ansible.log

# After Kubernetes is deployed, watch pods
export KUBECONFIG=~/prod-cluster-gitops/infrastructure/clusters/prod-cluster/kubeconfig.yaml
watch -n 5 'kubectl get pods -A'

Step 10: Verify Cluster

Verify the cluster is working:

# Set kubeconfig
export KUBECONFIG=~/prod-cluster-gitops/infrastructure/clusters/prod-cluster/kubeconfig.yaml

# Check nodes
kubectl get nodes

# Expected output:
# NAME                        STATUS   ROLES           AGE   VERSION
# prod-cluster-master-1       Ready    control-plane   20m   v1.33.5
# prod-cluster-master-2       Ready    control-plane   20m   v1.33.5
# prod-cluster-master-3       Ready    control-plane   20m   v1.33.5
# prod-cluster-worker-1       Ready    <none>          18m   v1.33.5
# prod-cluster-worker-2       Ready    <none>          18m   v1.33.5
# prod-cluster-worker-3       Ready    <none>          18m   v1.33.5

# Check vSphere CSI driver
kubectl get csidrivers

# Expected output: csi.vsphere.vmware.com

# Check storage classes
kubectl get storageclasses

# Expected output: vsphere-csi-sc (default)

# Check platform services
kubectl get helmreleases -A

# Expected output: 20+ HelmReleases in Ready state

# Check FluxCD reconciliation
flux get kustomizations

# Expected output: All Kustomizations in Ready state

All checks passed? Your cluster is ready for production workloads!

Step 11: Test vSphere Storage

Test vSphere CSI storage:

# Create PVC
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test-pvc
  namespace: default
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: vsphere-csi-sc
EOF

# Wait for PVC to be bound
kubectl wait --for=jsonpath='{.status.phase}'=Bound pvc/test-pvc --timeout=60s

# Check PVC
kubectl get pvc test-pvc

# Expected output: STATUS Bound

# Check PV in vSphere
# Open vSphere Client → Datastore → See new VMDK file

Check Your Work

Verify everything is working:

  • All 6 nodes are Ready

  • vSphere CSI driver is installed

  • Storage class is available

  • Test PVC is bound

  • All platform services are deployed (HelmReleases Ready)

  • FluxCD is reconciling (Kustomizations Ready)

  • MetalLB is assigning IPs to LoadBalancer services

Troubleshooting

SSH Access Fails

Error:

Error: Cannot SSH to VM 192.168.1.10

Solution:

# Test SSH manually
ssh ubuntu@192.168.1.10

# If password prompt, add SSH key
ssh-copy-id -i ~/.config/opencenter/clusters/my-company/secrets/ssh/prod-cluster-key.pub ubuntu@192.168.1.10

# Verify SSH key authentication
ssh -i ~/.config/opencenter/clusters/my-company/secrets/ssh/prod-cluster-key ubuntu@192.168.1.10

vSphere Credentials Invalid

Error:

Error: vSphere authentication failed

Solution:

# Test vSphere credentials manually
govc about -u 'administrator@vsphere.local:password@vcenter.example.com'

# If fails, verify:
# 1. vCenter URL is correct
# 2. Username is correct (include @vsphere.local)
# 3. Password is correct
# 4. User has sufficient permissions

# Update configuration
opencenter cluster edit prod-cluster

Kubespray Deployment Fails

Error:

Error: Ansible playbook failed

Solution:

# Check Ansible log
tail -100 ~/prod-cluster-gitops/infrastructure/clusters/prod-cluster/ansible.log

# Common causes:
# 1. VM not reachable (check network)
# 2. Insufficient resources (check VM specs)
# 3. OS not supported (must be Ubuntu 24.04)

# Retry deployment
opencenter cluster deploy prod-cluster

vSphere CSI Not Working

Error:

PVC stuck in Pending state

Solution:

# Check CSI driver pods
kubectl get pods -n vmware-system-csi

# Check CSI driver logs
kubectl logs -n vmware-system-csi deployment/vsphere-csi-controller

# Common causes:
# 1. vSphere credentials incorrect
# 2. Datacenter/datastore not found
# 3. Insufficient vSphere permissions

# Verify vSphere configuration
kubectl get secret vsphere-config-secret -n vmware-system-csi -o yaml

Next Steps

Now that you have a VMware cluster, explore these topics:

Deploy Applications:

  • ../operations/customize-services.md[Customize Services] - Configure platform services

  • ../operations/manage-secrets.md[Manage Secrets] - Encrypt and rotate secrets

Cluster Management:

  • ../operations/add-worker-pools.md[Add Worker Pools] - Scale cluster capacity

  • ../operations/backup-and-restore.md[Backup and Restore] - Configure disaster recovery

  • ../operations/upgrade-kubernetes.md[Upgrade Kubernetes] - Upgrade cluster version

Multi-Cluster:

  • multi-cluster-setup.md[Multi-Cluster Management] - Manage multiple clusters

Understanding:

  • ../concepts/gitops-workflow.md[GitOps Workflow] - How GitOps works

  • ../concepts/provider-comparison.md[Provider Comparison] - Compare providers

What You Learned

In this tutorial, you:

  • Provisioned VMs in VMware vSphere

  • Initialized a VMware cluster configuration

  • Configured VM inventory and vSphere credentials

  • Validated configuration and SSH connectivity

  • Generated a complete GitOps repository

  • Deployed Kubernetes to pre-provisioned VMs

  • Verified cluster health and vSphere CSI storage

  • Tested persistent volume provisioning

You now have a production-ready Kubernetes cluster on VMware vSphere with GitOps continuous delivery!


Evidence

This tutorial is based on:

  • VMware provider: docs/providers/README.md:8

  • VMware defaults: internal/config/defaults.go:27-31

  • Workflow validation: tests/features/workflow.feature:1-73

  • Bootstrap process: cmd/cluster_bootstrap.go

  • GitOps structure: internal/gitops/, Ecosystem.md

  • Service configuration: internal/config/defaults.go:293-388