Tracing (Tempo)

Purpose: For platform engineers, shows how to configure Tempo storage, sampling strategies, and TraceQL examples.

Task Summary

Tempo stores distributed traces received from the OpenTelemetry Collector. Grafana queries Tempo to display trace timelines, service dependency maps, and span details. This guide covers trace ingestion, storage, querying, and sampling configuration.

Prerequisites

Tempo deployed via FluxCD from openCenter-gitops-base
OpenTelemetry Collector configured to export traces to Tempo
Grafana configured with Tempo as a data source (default in openCenter)

Trace Ingestion

Tempo receives traces via OTLP from the OpenTelemetry Collector. The Collector is configured to export to Tempo's gRPC endpoint:

# OpenTelemetry Collector exporter configuration
exporters:
  otlp/tempo:
    endpoint: tempo.monitoring.svc.cluster.local:4317
    tls:
      insecure: true  # Within cluster network

Applications send traces to the OpenTelemetry Collector (not directly to Tempo). See OpenTelemetry for Collector configuration.

Query Traces with TraceQL

Access Grafana's Explore view and select the Tempo data source.

Find traces by service name

{resource.service.name = "my-app"}

Find traces with errors

{status = error}

Find slow spans

{duration > 500ms}

Combine filters

{resource.service.name = "my-app" && span.http.status_code >= 500 && duration > 1s}

Search by trace ID

Paste a trace ID directly into the Grafana Explore search bar to view the full trace timeline.

Sampling Strategies

Not all traces need to be stored. Sampling reduces storage costs while preserving visibility into errors and slow requests.

Configure sampling in the OpenTelemetry Collector's processor pipeline:

processors:
  tail_sampling:
    decision_wait: 10s
    policies:
      - name: errors
        type: status_code
        status_code:
          status_codes: [ERROR]
      - name: slow-requests
        type: latency
        latency:
          threshold_ms: 1000
      - name: probabilistic
        type: probabilistic
        probabilistic:
          sampling_percentage: 10

This configuration keeps all error traces, all traces slower than 1 second, and 10% of remaining traces.

Storage Configuration

Tempo stores traces in a backend configured in the HelmRelease values:

# applications/overlays/<cluster>/services/tempo/override-values.yaml
tempo:
  storage:
    trace:
      backend: local
      local:
        path: /var/tempo/traces
      wal:
        path: /var/tempo/wal
  persistence:
    enabled: true
    storageClassName: longhorn
    size: 50Gi

For larger clusters, use S3-compatible storage:

tempo:
  storage:
    trace:
      backend: s3
      s3:
        bucket: tempo-traces
        endpoint: minio.storage.svc.cluster.local:9000
        insecure: true

Verification

# Check Tempo pods are running
kubectl get pods -n monitoring -l app.kubernetes.io/name=tempo

# Verify Tempo is receiving traces
kubectl port-forward svc/tempo -n monitoring 3200:3200
curl -s http://localhost:3200/ready
# Should return "ready"

# Check trace count via Tempo API
curl -s http://localhost:3200/api/search?limit=5 | jq .

Troubleshooting

No traces appearing in Grafana: Verify the OpenTelemetry Collector is exporting to Tempo. Check Collector logs:

kubectl logs -n monitoring -l app.kubernetes.io/name=opentelemetry-collector --tail=20

"too many traces" / high storage usage: Enable tail sampling in the OpenTelemetry Collector (see Sampling Strategies above). Reduce retention in Tempo's compactor config.

Task Summary​

Prerequisites​

Trace Ingestion​

Query Traces with TraceQL​

Find traces by service name​

Find traces with errors​

Find slow spans​

Combine filters​

Search by trace ID​

Sampling Strategies​

Storage Configuration​

Verification​

Troubleshooting​

Further Reading​