Skip to main content

Tracing (Tempo)

Purpose: For platform engineers, shows how to configure Tempo storage, sampling strategies, and TraceQL examples.

Task Summary

Tempo stores distributed traces received from the OpenTelemetry Collector. Grafana queries Tempo to display trace timelines, service dependency maps, and span details. This guide covers trace ingestion, storage, querying, and sampling configuration.

Prerequisites

  • Tempo deployed via FluxCD from openCenter-gitops-base
  • OpenTelemetry Collector configured to export traces to Tempo
  • Grafana configured with Tempo as a data source (default in openCenter)

Trace Ingestion

Tempo receives traces via OTLP from the OpenTelemetry Collector. The Collector is configured to export to Tempo's gRPC endpoint:

# OpenTelemetry Collector exporter configuration
exporters:
otlp/tempo:
endpoint: tempo.monitoring.svc.cluster.local:4317
tls:
insecure: true # Within cluster network

Applications send traces to the OpenTelemetry Collector (not directly to Tempo). See OpenTelemetry for Collector configuration.

Query Traces with TraceQL

Access Grafana's Explore view and select the Tempo data source.

Find traces by service name

{resource.service.name = "my-app"}

Find traces with errors

{status = error}

Find slow spans

{duration > 500ms}

Combine filters

{resource.service.name = "my-app" && span.http.status_code >= 500 && duration > 1s}

Search by trace ID

Paste a trace ID directly into the Grafana Explore search bar to view the full trace timeline.

Sampling Strategies

Not all traces need to be stored. Sampling reduces storage costs while preserving visibility into errors and slow requests.

Configure sampling in the OpenTelemetry Collector's processor pipeline:

processors:
tail_sampling:
decision_wait: 10s
policies:
- name: errors
type: status_code
status_code:
status_codes: [ERROR]
- name: slow-requests
type: latency
latency:
threshold_ms: 1000
- name: probabilistic
type: probabilistic
probabilistic:
sampling_percentage: 10

This configuration keeps all error traces, all traces slower than 1 second, and 10% of remaining traces.

Storage Configuration

Tempo stores traces in a backend configured in the HelmRelease values:

# applications/overlays/<cluster>/services/tempo/override-values.yaml
tempo:
storage:
trace:
backend: local
local:
path: /var/tempo/traces
wal:
path: /var/tempo/wal
persistence:
enabled: true
storageClassName: longhorn
size: 50Gi

For larger clusters, use S3-compatible storage:

tempo:
storage:
trace:
backend: s3
s3:
bucket: tempo-traces
endpoint: minio.storage.svc.cluster.local:9000
insecure: true

Verification

# Check Tempo pods are running
kubectl get pods -n monitoring -l app.kubernetes.io/name=tempo

# Verify Tempo is receiving traces
kubectl port-forward svc/tempo -n monitoring 3200:3200
curl -s http://localhost:3200/ready
# Should return "ready"

# Check trace count via Tempo API
curl -s http://localhost:3200/api/search?limit=5 | jq .

Troubleshooting

No traces appearing in Grafana: Verify the OpenTelemetry Collector is exporting to Tempo. Check Collector logs:

kubectl logs -n monitoring -l app.kubernetes.io/name=opentelemetry-collector --tail=20

"too many traces" / high storage usage: Enable tail sampling in the OpenTelemetry Collector (see Sampling Strategies above). Reduce retention in Tempo's compactor config.

Further Reading