AI Blueprint
Purpose: For platform engineers and operators, describes the AI blueprint that extends the openCenter platform foundation with GPU-ready infrastructure, policy controls, and an ops model designed for ML workloads.
Overview
AI workloads do not need a separate platform. They need GPUs, policy controls, and an ops model that does not fall apart when you scale past the demo. The AI blueprint gives you all three without reinventing your infrastructure.
What You Get
- GPU-ready infrastructure with repeatable, auditable deployment workflows.
- Policy, security, and operational controls baked in — not retrofitted after launch.
- Scale across environments without rebuilding the platform every time you move.
Capabilities
GPU-Ready From the Start
Infrastructure patterns designed for AI workloads. Not generic compute with a GPU driver bolted on. The blueprint configures:
- GPU Operator for driver lifecycle management and device plugin registration.
- Node labeling and taints for GPU-equipped nodes to prevent non-GPU workloads from consuming GPU resources.
- Scheduling constraints (node affinity, tolerations) so GPU workloads land on the right hardware.
- Resource quotas to prevent a single team or namespace from monopolizing GPU capacity.
Policy Before Production
Security and compliance controls ship with the cluster. Not added after the first audit finding. The AI blueprint inherits the full openCenter platform foundation security stack:
- Kyverno policies enforce container security standards.
- Pod Security Admission prevents privileged escalation.
- NetworkPolicies isolate training jobs from production inference endpoints.
- SOPS-encrypted secrets protect model registry credentials and API keys.
Same Ops, Every Environment
Dev, staging, production — same workflows, same tooling, same confidence level. The GitOps model ensures that a GPU cluster in a lab and a GPU cluster in production are configured identically, differing only in the Kustomize overlay values (node counts, GPU types, resource limits).
Current Status
The AI blueprint is in Preview. GPU operator configuration and scheduling patterns are defined. The following areas are under active development:
- Kyverno policies specific to AI workloads (model artifact validation, training job resource limits).
- Grafana dashboards for GPU utilization, training job progress, and inference latency.
- Air-gap packaging for GPU operator images and CUDA runtime dependencies.
Check the Data Services roadmap for timeline updates.
Relationship to the Platform Foundation
The AI blueprint is a superset of the openCenter platform foundation. Every AI cluster includes the full platform services stack (observability, security, GitOps, storage, networking) plus GPU-specific configuration. You do not need to deploy both — the foundation is already part of the AI blueprint.