Sampling Strategies: Reducing Telemetry Volume and Costs
In a distributed system with hundreds of microservices and thousands of requests per second, collecting every trace is prohibitively expensive. Sampling is the technique that selects a representative subset of traces, reducing data volume and storage costs without losing the ability to diagnose problems.
The sampling challenge is finding the right balance: sampling too little reduces visibility; sampling too much increases costs. Advanced sampling strategies allow keeping all traces with errors and slow requests, discarding only "normal" and repetitive traffic.
What You Will Learn in This Article
- Head-based sampling: decision at the start of the trace
- Tail-based sampling: decision at the end of the trace
- Probabilistic sampling and rate limiting
- Configuring sampling in the Collector
- Cost optimization strategies
- Trade-offs between accuracy and costs
Head-Based Sampling
Head-based sampling makes the decision to sample a trace at the
beginning, in the first span (root span). The decision is then propagated to all
downstream services through the sampled flag in the traceparent
header. If the root span decides not to sample, no service in the chain will record spans
for that trace.
The main advantage is simplicity and efficiency: the decision is made immediately, without waiting for trace completion. The disadvantage is that it does not know the request result at decision time: a trace with an error might be discarded.
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.sampling import (
TraceIdRatioBased,
ParentBasedTraceIdRatio,
ALWAYS_ON,
ALWAYS_OFF,
)
# --- 1. Probability Sampler (25% of traces) ---
# Each trace has a 25% probability of being sampled
sampler_25 = TraceIdRatioBased(0.25)
# --- 2. ParentBased Sampler (respects parent decision) ---
# If parent is sampled, child is sampled
# If no parent, uses the specified ratio
sampler_parent = ParentBasedTraceIdRatio(0.25)
# --- 3. Configuration in TracerProvider ---
provider = TracerProvider(
sampler=ParentBasedTraceIdRatio(0.25)
)
trace.set_tracer_provider(provider)
# --- 4. Configuration via environment variables ---
# OTEL_TRACES_SAMPLER=parentbased_traceidratio
# OTEL_TRACES_SAMPLER_ARG=0.25
Head-Based Sampler Types
| Sampler | Behavior | Typical Use |
|---|---|---|
| AlwaysOn | Samples all traces (100%) | Development, testing, low-traffic services |
| AlwaysOff | Never samples (0%) | Temporarily disable tracing |
| TraceIdRatioBased | Samples a fixed percentage | Simple approach to reduce volume |
| ParentBased | Respects parent decision | Ensure consistency across services |
Tail-Based Sampling
Tail-based sampling makes the decision to sample a trace after all spans have been collected, analyzing the complete trace. This enables intelligent decisions based on request outcome: keeping all traces with errors, slow requests, and anomalous traces, discarding only repetitive success traffic.
Tail-based sampling requires a centralized component (typically the Collector in Gateway mode) that collects all spans of a trace, assembles them, evaluates sampling policies, and decides whether to keep or discard the entire trace.
# tail_sampling configuration in Gateway Collector
processors:
tail_sampling:
# Maximum time to wait for all spans of a trace
decision_wait: 30s
# Maximum number of traces awaiting decision
num_traces: 100000
# Expected new traces per second
expected_new_traces_per_sec: 1000
policies:
# Policy 1: keep ALL traces with errors
- name: errors-policy
type: status_code
status_code:
status_codes:
- ERROR
# Policy 2: keep slow traces (latency > 2s)
- name: latency-policy
type: latency
latency:
threshold_ms: 2000
# Policy 3: keep traces with specific attributes
- name: vip-customers
type: string_attribute
string_attribute:
key: customer.tier
values:
- premium
- enterprise
# Policy 4: probabilistic sampling for the rest
- name: probabilistic-policy
type: probabilistic
probabilistic:
sampling_percentage: 10
# Policy 5: global rate limiting
- name: rate-limiting
type: rate_limiting
rate_limiting:
spans_per_second: 500
# Policy 6: composite policy (AND/OR)
- name: composite-policy
type: composite
composite:
max_total_spans_per_second: 1000
policy_order: [errors-policy, latency-policy, probabilistic-policy]
rate_allocation:
- policy: errors-policy
percent: 50
- policy: latency-policy
percent: 30
- policy: probabilistic-policy
percent: 20
Head vs Tail Sampling Comparison
Head-Based vs Tail-Based Sampling
| Aspect | Head-Based | Tail-Based |
|---|---|---|
| Decision moment | Start of trace | End of trace |
| Result awareness | No | Yes (errors, latency, attributes) |
| Required resources | Minimal (distributed) | Significant (centralized) |
| Consistency | Always consistent | Risk of orphan spans |
| Complexity | Low | High (dedicated infrastructure) |
| Error capture | Only if randomly sampled | All (dedicated policy) |
Cost Optimization Strategies
Observability costs grow with telemetry volume. Optimization strategies focus on three levers: reducing trace volume, reducing the number of attributes per span, and optimizing backend retention.
# Multi-level optimization strategy
# 1. Filter noise in the Collector
processors:
filter:
traces:
span:
# Eliminate health checks and readiness probes
- 'attributes["http.route"] == "/health"'
- 'attributes["http.route"] == "/ready"'
- 'attributes["http.route"] == "/metrics"'
# Eliminate static asset requests
- 'attributes["http.route"] =~ "/static/.*"'
# 2. Remove high-cardinality attributes
attributes:
actions:
# Remove sensitive headers
- key: http.request.header.authorization
action: delete
- key: http.request.header.cookie
action: delete
# Hash long SQL queries
- key: db.query.text
action: hash
# 3. Intelligent tail sampling
tail_sampling:
policies:
- name: keep-errors
type: status_code
status_code:
status_codes: [ERROR]
- name: keep-slow
type: latency
latency:
threshold_ms: 1000
- name: sample-rest
type: probabilistic
probabilistic:
sampling_percentage: 5
# 4. Batch for efficiency
batch:
send_batch_size: 2048
timeout: 10s
Cost Optimization Checklist
- Filter noise: eliminate health checks, readiness probes, and static assets before the backend
- Control cardinality: limit metric labels to bounded values (max 5-7 labels per metric)
- Tail sample errors: keep 100% of error traces, sample the rest
- Differentiated retention: long retention for aggregated metrics (90 days), short for raw traces (7-14 days)
- Compression: enable gzip/zstd in the Collector to reduce network traffic
- Metric downsampling: reduce resolution of historical metrics (from 15s to 5m after 7 days)
Adaptive Sampling
Adaptive sampling automatically adjusts the sampling rate based on current traffic volume. When traffic is low, it samples a higher percentage; when traffic is high, it reduces the rate to keep volume within budget limits.
Rate limiting in the Collector's tail sampling is a simple form of adaptive sampling:
spans_per_second: 500 guarantees a constant maximum volume regardless of traffic.
For more sophisticated strategies, custom solutions or SaaS services implementing adaptive
sampling at the platform level are needed.
Conclusions and Next Steps
Sampling is a fundamental skill for managing observability at scale. The strategy choice depends on traffic volume, budget, and required visibility level. The recommended approach combines head-based sampling for base volume reduction with tail-based sampling to ensure errors and anomalies are always captured.
The golden rule is: sample traces, never metrics. Aggregated metrics have a fixed cost independent of traffic and provide the high-level visibility needed for alerts. Traces are expensive but only needed for detailed debugging.
In the next article, we will explore AI observability, analyzing how to trace Large Language Model calls, monitor token usage, inference latencies, and AI agent behavior.







