Metrics (kube-prometheus-stack)
Purpose: For platform engineers, shows how to configure Prometheus alerting rules, retention, remote write, and ServiceMonitor patterns.
Task Summary
Prometheus is deployed as part of kube-prometheus-stack via FluxCD. This guide covers how to add scrape targets for your services, create alerting rules, and tune retention settings.
Prerequisites
kube-prometheus-stackdeployed (default in openCenter clusters)kubectlaccess to the cluster- Familiarity with PromQL basics
Add a Scrape Target
Prometheus discovers scrape targets through ServiceMonitor and PodMonitor CRDs. To expose metrics from your application:
Step 1: Expose a /metrics endpoint
Your application must serve Prometheus-format metrics on an HTTP endpoint (typically /metrics).
Step 2: Create a ServiceMonitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: my-app
namespace: my-app
labels:
release: kube-prometheus-stack # Must match Prometheus serviceMonitorSelector
spec:
selector:
matchLabels:
app: my-app
endpoints:
- port: http-metrics
interval: 30s
path: /metrics
The release: kube-prometheus-stack label is required for Prometheus to discover the ServiceMonitor. Without it, the target is ignored.
Step 3: Verify the target
# Check ServiceMonitor was created
kubectl get servicemonitor -n my-app
# Port-forward to Prometheus UI
kubectl port-forward svc/kube-prometheus-stack-prometheus -n monitoring 9090:9090
# Open http://localhost:9090/targets — your target should appear as UP
Create Alerting Rules
Alerting rules are defined via PrometheusRule CRDs:
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: my-app-alerts
namespace: my-app
labels:
release: kube-prometheus-stack
spec:
groups:
- name: my-app
rules:
- alert: HighErrorRate
expr: rate(http_requests_total{status=~"5..", job="my-app"}[5m]) > 0.1
for: 5m
labels:
severity: warning
annotations:
summary: "High 5xx error rate on {{ $labels.instance }}"
description: "Error rate is {{ $value }} req/s over the last 5 minutes."
Verify the rule is loaded:
kubectl port-forward svc/kube-prometheus-stack-prometheus -n monitoring 9090:9090
# Open http://localhost:9090/rules — your rule group should appear
Recording Rules
Recording rules pre-compute expensive queries and store the result as a new time series:
spec:
groups:
- name: my-app-recording
rules:
- record: my_app:http_request_duration_seconds:p99
expr: histogram_quantile(0.99, rate(http_request_duration_seconds_bucket{job="my-app"}[5m]))
Use recording rules for dashboard queries that aggregate across many series or use histogram_quantile.
Retention and Storage
Retention is configured in the HelmRelease values. The default is 15 days. To change it, add an override in the customer overlay:
# applications/overlays/<cluster>/services/kube-prometheus-stack/override-values.yaml
prometheus:
prometheusSpec:
retention: 30d
retentionSize: 50GB
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: longhorn
resources:
requests:
storage: 100Gi
Verification
# Check Prometheus is running
kubectl get pods -n monitoring -l app.kubernetes.io/name=prometheus
# Check all targets are healthy
kubectl port-forward svc/kube-prometheus-stack-prometheus -n monitoring 9090:9090
# Visit http://localhost:9090/targets
# Check alerting rules are loaded
# Visit http://localhost:9090/rules
Troubleshooting
ServiceMonitor target not appearing:
Verify the release: kube-prometheus-stack label is present. Check that the Service selector matches the ServiceMonitor selector.
"out of memory" on Prometheus pod:
Reduce the number of scraped series or increase memory limits in the HelmRelease values. Check cardinality with: prometheus_tsdb_head_series metric.
Further Reading
- Stack Overview — how Prometheus fits into the observability stack
- Dashboards & Alerts — Grafana dashboards and Alertmanager routing
- OpenTelemetry — forwarding OTLP metrics to Prometheus