Managed Kafka (Streaming Blueprint)
Purpose: For platform engineers and app developers, explains the Streaming blueprint — how Apache Kafka is operated as a managed platform service with GitOps lifecycle, built-in observability, and security by default.
Overview
openCenter Managed Kafka delivers Apache Kafka as a platform service using the Strimzi operator. Kafka clusters, topics, users, and ACLs are declared in Git and reconciled by FluxCD. Upgrades are a version bump in YAML with automatic rolling restarts and health checks.
Status: Limited Availability (first paying customers)
Architecture
Deployment Flow
- Declare — Kafka CR, topics, users defined in cluster overlay Git repository
- Reconcile — FluxCD applies manifests; Strimzi operators provision resources
- Secure — TLS certificates issued by cert-manager; secrets SOPS-encrypted in Git
- Monitor — Prometheus scrapes JMX metrics; Grafana dashboards deploy automatically
- Operate — Rolling upgrades, partition reassignment, backup via standard day-2 operations
- Scale — Add brokers via CR spec change; Cruise Control handles partition rebalancing
What You Get
- Multi-broker Kafka clusters with KRaft mode (no ZooKeeper dependency)
- Node pools for workload isolation (broker vs. controller roles)
- Rolling upgrades with automatic health checks
- Rack awareness for fault-domain distribution
- Kafka Connect clusters with OCI-based plugin management
- MirrorMaker 2 for cross-cluster replication
- HTTP Bridge (REST API for Kafka operations)
- MQTT Bridge (one-way MQTT 3.1.1 ingestion to Kafka topics)
What Is Not Included
- Serverless Kafka (no consumption-based model)
- Native autoscaling (manual scaling via CR changes)
- Built-in schema registry (planned Q4 2026 as add-on; external deployment possible now)
- Unlimited connector support (curated connector catalog)
- Managed runtime for Kafka Streams applications
Four Pillars
GitOps-Managed Lifecycle
- All Kafka resources (clusters, topics, users, ACLs) declared in Git
- FluxCD reconciles changes with SOPS decryption at apply time
- Drift detection flags manual modifications
- Rollback = revert Git commit
Observability From Boot
- Prometheus exporters (JMX, Kafka Exporter) deployed with every cluster
- Pre-built Grafana dashboards for broker health, topic throughput, consumer lag
- Alerting rules for: under-replicated partitions, ISR shrink, disk usage, no active controller
- Loki integration for operator and broker logs
Security by Default
- Inter-broker TLS (cert-manager issued certificates)
- Client authentication: SASL/SCRAM, mTLS, OAuth2/OIDC (Keycloak)
- Topic-level ACLs via KafkaUser CR
- Per-user quotas (produce/consume byte rates, connection limits)
- NetworkPolicies restricting access to Kafka namespace
- Secrets SOPS-encrypted in Git, decrypted at reconciliation time
- Cosign signatures for operator images; SBOM in SPDX-JSON format
Operational Resilience
- Topic configuration backup
- Partition reassignment tooling (Cruise Control)
- Tested recovery runbooks for common failure modes
- Velero integration for disaster recovery of operator state
- Change windows and maintenance workflow support
Strimzi Ecosystem Components
| Component | Purpose |
|---|---|
| Strimzi Kafka Operator | Cluster, Topic, User, and Entity operators |
| Kafka Access Operator | Service Binding-style Secrets for application connectivity |
| Kafka Bridge | HTTP 1.1 REST API for produce/consume |
| MQTT Bridge | One-way MQTT 3.1.1 ingestion to Kafka topics |
| Drain Cleaner | Admission webhook for safe node draining during maintenance |
| Kafka OAuth | OAuth2/OIDC authentication with Keycloak integration |
| Quotas Plugin | Aggregate broker quotas, storage-aware throttling |
| Config Provider | Kubernetes Secret/ConfigMap integration for Kafka configuration |
Capability Matrix
| Capability | Support Level | Notes |
|---|---|---|
| Kafka broker provisioning | ✅ Full | Multi-broker, KRaft, node pools |
| High availability | ✅ Full | Rack awareness, min.insync.replicas |
| Encryption in transit | ✅ Full | TLS for inter-broker and client connections |
| Authentication | ✅ Full | SASL/SCRAM, mTLS, OAuth2/OIDC |
| Authorization (ACLs) | ✅ Full | KafkaUser CR with User Operator |
| Kafka Connect | ✅ Full | OCI plugin management, connector lifecycle |
| Cross-cluster replication | ✅ Full | MirrorMaker 2 |
| Observability | ✅ Full | Prometheus + Grafana + Alertmanager + Loki |
| Infrastructure as Code | ✅ Full | CRDs in Git, FluxCD reconciliation |
| Air-gap deployment | ✅ Full | All images mirrorable, signed packages |
| Operational runbooks | ✅ Full | Documented day-2 procedures |
| Topic/user management | ✅ Full | KafkaTopic and KafkaUser CRDs |
| HTTP Bridge | ✅ Full | REST API for non-native clients |
| MQTT Bridge | ✅ Full | Device ingestion via MQTT 3.1.1 |
| Cluster topology options | ⚠️ Partial | Single-node, production, stretch (multi-AZ) |
| Autoscaling | ⚠️ Partial | Manual scaling; Cruise Control for rebalancing |
| SLA eligibility | ⚠️ Partial | Limited availability; GA SLA pending |
| Incident response | ⚠️ Partial | Runbooks available; 24/7 coverage in progress |
| Schema Registry | ❌ Not included | Planned Q4 2026; external deployment possible |
Target Audience
- Platform engineers — deploy and operate Kafka clusters
- Application developers — consume Kafka via topics, users, and ACLs declared in Git
- Data engineers — use Kafka Connect and MirrorMaker 2 for data integration
Further Reading
- Kafka Architecture — CRDs, topology, platform dependencies
- Kafka Topics — how-to for topic management
- Kafka Security — TLS and authentication configuration
- Kafka Monitoring — Prometheus metrics and alerts
- Kafka Deployment — end-to-end deployment tutorial
- Data Services Overview — family overview
- Portfolio Strategy — roadmap and sequencing