Skip to main content

Stack Overview

Purpose: For platform engineers, explains how the observability components work together, covering data flow from pods to dashboards.

Concept Summary

openCenter deploys a full observability stack through openCenter-gitops-base via FluxCD. The stack covers three signal types — metrics, logs, and traces — with a unified visualization layer. Each component handles one signal type, and OpenTelemetry acts as the collection and routing layer that ties them together.

SignalCollectorStorageQuery
MetricsPrometheus (scrape) + OpenTelemetry (OTLP)Prometheus TSDBPromQL via Grafana
LogsPromtail / OpenTelemetryLokiLogQL via Grafana
TracesOpenTelemetry CollectorTempoTraceQL via Grafana
VisualizationGrafana (all signals)

How It Works

Data Flow

Component Roles

Prometheus scrapes metrics from pods and services using ServiceMonitor and PodMonitor CRDs. It stores time-series data in its local TSDB and evaluates alerting/recording rules. Deployed as part of kube-prometheus-stack.

Alertmanager receives alerts from Prometheus, deduplicates them, groups related alerts, and routes them to notification channels (email, Slack, PagerDuty, webhooks). Also part of kube-prometheus-stack.

Grafana provides dashboards for all three signal types. It queries Prometheus (PromQL), Loki (LogQL), and Tempo (TraceQL) as data sources. Pre-configured dashboards are deployed via ConfigMaps. Also part of kube-prometheus-stack.

Loki stores and indexes log streams. It receives logs from Promtail (DaemonSet that tails container logs) or from the OpenTelemetry Collector. Loki indexes labels (namespace, pod, container) but stores log lines unindexed, keeping storage costs low.

Tempo stores distributed traces. It receives spans via OTLP from the OpenTelemetry Collector and stores them in an object store or local filesystem. Grafana queries Tempo to visualize trace timelines and service maps.

OpenTelemetry Collector acts as a vendor-neutral telemetry pipeline. It receives traces, metrics, and logs via OTLP, processes them (batching, filtering, enrichment), and exports to the appropriate backend (Prometheus, Loki, Tempo).

Trade-offs and Alternatives

Why kube-prometheus-stack instead of standalone Prometheus?

kube-prometheus-stack bundles Prometheus, Alertmanager, Grafana, and a set of recording rules and dashboards for Kubernetes. Deploying them together ensures consistent configuration and pre-built dashboards for node, pod, and cluster health.

Why Loki instead of Elasticsearch?

Loki's label-based indexing uses significantly less storage and memory than Elasticsearch's full-text indexing. For Kubernetes log aggregation where queries are typically filtered by namespace, pod, or container, Loki's approach is a better fit. The trade-off is that full-text search across log content is slower.

Why a separate OpenTelemetry Collector?

The Collector decouples instrumentation from backends. Applications send OTLP and the Collector routes to whatever backends are configured. If a backend changes (e.g., replacing Tempo with Jaeger), applications do not need to be reconfigured.

Common Misconceptions

"Prometheus collects logs and traces too." Prometheus handles metrics only. Logs go to Loki, traces go to Tempo. Grafana unifies the view across all three.

"OpenTelemetry replaces Prometheus." OpenTelemetry can forward metrics to Prometheus, but Prometheus still handles scraping, storage, alerting rules, and recording rules. They are complementary.

Further Reading