OTel Collector: The Heart of the Telemetry Pipeline
The OpenTelemetry Collector is a standalone, vendor-agnostic component that receives, processes, and exports telemetry data. It functions as an intelligent proxy between instrumented applications and observability backends, providing an abstraction layer that decouples data generation from its final destination.
The Collector is the most critical component of a production observability infrastructure. Without it, every application must know and manage the direct connection to the backend, making it difficult to switch backends, add intermediate processing, or handle failover scenarios. With the Collector, export configuration is centralized and applications only need to know how to reach the Collector.
What You Will Learn in This Article
- The internal architecture of the Collector: Receivers, Processors, Exporters
- How to configure the Collector with YAML
- The most important Processors: batch, filter, attributes, tail_sampling
- Deployment patterns: Agent, Gateway, and Sidecar
- Scaling and high availability of the Collector
- Monitoring the Collector itself (meta-observability)
Collector Architecture
The Collector has a pipeline architecture composed of three types of components, connected through configurable pipelines for each signal type (traces, metrics, logs):
Collector Pipeline
Receivers (Input) → Processors (Transformation) →
Exporters (Output)
A pipeline can have multiple receivers (collect from multiple sources), a chain of processors
(filter, enrich, sample), and multiple exporters (send to multiple backends simultaneously).
Receivers: Entry Points
Receivers are the Collector's entry points. They accept telemetry data from various sources and convert them to the internal OTel format. The most common receiver is the OTLP receiver, but the Collector supports dozens of receivers for different protocols and sources.
Processors: Data Transformation
Processors modify data between reception and export. They can filter, enrich, sample, batch, and transform telemetry signals. The order of processors in the pipeline is significant: they execute sequentially.
Exporters: Data Destinations
Exporters send processed data to final destinations: storage backends, visualization services, or other Collectors. A pipeline can have multiple exporters to send the same data to different destinations simultaneously.
Complete Collector Configuration
The Collector is configured with a YAML file that defines receivers, processors, exporters, and the pipelines connecting them. Here is a production-ready configuration:
# otel-collector-config.yaml - Production-ready configuration
receivers:
# OTLP receiver: OpenTelemetry's native protocol
otlp:
protocols:
grpc:
endpoint: "0.0.0.0:4317"
max_recv_msg_size_mib: 4
http:
endpoint: "0.0.0.0:4318"
cors:
allowed_origins: ["*"]
# Prometheus receiver: scraping Prometheus metrics
prometheus:
config:
scrape_configs:
- job_name: "kubernetes-pods"
scrape_interval: 30s
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
# Filelog receiver: collecting logs from files
filelog:
include: ["/var/log/pods/*/*/*.log"]
operators:
- type: json_parser
timestamp:
parse_from: attributes.time
layout: "%Y-%m-%dT%H:%M:%S.%LZ"
processors:
# Batch processor: grouping for efficient sending
batch:
send_batch_size: 1024
send_batch_max_size: 2048
timeout: 5s
# Memory limiter: OOM protection
memory_limiter:
check_interval: 1s
limit_mib: 512
spike_limit_mib: 128
# Resource processor: add global attributes
resource:
attributes:
- key: deployment.environment
value: "production"
action: upsert
- key: collector.version
value: "0.96.0"
action: insert
# Attributes processor: filter sensitive attributes
attributes:
actions:
- key: http.request.header.authorization
action: delete
- key: db.query.text
action: hash
# Filter processor: discard unnecessary telemetry
filter:
error_mode: ignore
traces:
span:
- 'attributes["http.route"] == "/health"'
- 'attributes["http.route"] == "/ready"'
metrics:
metric:
- 'name == "http.server.request.duration" and resource.attributes["service.name"] == "debug-service"'
exporters:
# OTLP exporter: to Jaeger for traces
otlp/jaeger:
endpoint: "jaeger-collector:4317"
tls:
insecure: true
# Prometheus exporter: expose metrics for Prometheus
prometheus:
endpoint: "0.0.0.0:8889"
resource_to_telemetry_conversion:
enabled: true
# Loki exporter: to Loki for logs
loki:
endpoint: "http://loki:3100/loki/api/v1/push"
default_labels_enabled:
exporter: true
job: true
# Debug exporter: for testing
debug:
verbosity: basic
sampling_initial: 5
sampling_thereafter: 200
extensions:
# Health check endpoint
health_check:
endpoint: "0.0.0.0:13133"
# Performance profiling
pprof:
endpoint: "0.0.0.0:1777"
# zPages: built-in debugging UI
zpages:
endpoint: "0.0.0.0:55679"
service:
extensions: [health_check, pprof, zpages]
pipelines:
# Traces pipeline
traces:
receivers: [otlp]
processors: [memory_limiter, filter, attributes, batch]
exporters: [otlp/jaeger, debug]
# Metrics pipeline
metrics:
receivers: [otlp, prometheus]
processors: [memory_limiter, resource, batch]
exporters: [prometheus]
# Logs pipeline
logs:
receivers: [otlp, filelog]
processors: [memory_limiter, resource, attributes, batch]
exporters: [loki]
telemetry:
logs:
level: info
metrics:
address: "0.0.0.0:8888"
Deployment Patterns
The Collector can be deployed in three main patterns, each with specific advantages and trade-offs. In production environments, multiple patterns are often combined.
1. Agent Mode (DaemonSet)
One Collector instance per cluster node (DaemonSet in Kubernetes). Each application on the node sends telemetry to the local Collector via localhost, minimizing network latency. The agent performs lightweight pre-processing (batching, filtering) and forwards to the gateway.
2. Gateway Mode (Deployment)
A centralized pool of Collectors receiving telemetry from agents or directly from applications. The gateway performs heavy processing (tail sampling, aggregation) and manages export to backends. It scales horizontally based on telemetry volume.
3. Sidecar Mode
One Collector instance per pod, deployed as a sidecar container. It provides complete isolation between applications and allows service-specific configurations. It is the most resource-expensive pattern but the most flexible.
# Kubernetes: Agent (DaemonSet) + Gateway (Deployment)
# Agent DaemonSet - one per node
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: otel-collector-agent
namespace: observability
spec:
selector:
matchLabels:
app: otel-collector-agent
template:
metadata:
labels:
app: otel-collector-agent
spec:
containers:
- name: collector
image: otel/opentelemetry-collector-contrib:0.96.0
args: ["--config=/etc/otel/config.yaml"]
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
ports:
- containerPort: 4317 # OTLP gRPC
- containerPort: 4318 # OTLP HTTP
volumeMounts:
- name: config
mountPath: /etc/otel
volumes:
- name: config
configMap:
name: otel-agent-config
---
# Gateway Deployment - scalable
apiVersion: apps/v1
kind: Deployment
metadata:
name: otel-collector-gateway
namespace: observability
spec:
replicas: 3
selector:
matchLabels:
app: otel-collector-gateway
template:
metadata:
labels:
app: otel-collector-gateway
spec:
containers:
- name: collector
image: otel/opentelemetry-collector-contrib:0.96.0
args: ["--config=/etc/otel/config.yaml"]
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 2
memory: 4Gi
Deployment Pattern Comparison
| Pattern | Resources | Latency | Isolation | Typical Use |
|---|---|---|---|---|
| Agent | Low (per node) | Minimal (localhost) | Per node | Pre-processing, buffering |
| Gateway | Medium-High (pool) | Medium (network) | Centralized | Tail sampling, aggregation |
| Sidecar | High (per pod) | Minimal (localhost) | Per service | Specific config, isolation |
Meta-Observability: Monitoring the Collector
The Collector is a critical infrastructure component. If it stops working, you lose visibility into the entire system. This is why monitoring the Collector itself is essential. The OTel Collector exposes internal metrics on a dedicated endpoint that can be scraped by Prometheus.
Key Collector Metrics to Monitor
- otelcol_receiver_accepted_spans: successfully received spans (input throughput)
- otelcol_receiver_refused_spans: refused spans (potential overload)
- otelcol_exporter_sent_spans: successfully exported spans (output throughput)
- otelcol_exporter_send_failed_spans: export errors (backend issues)
- otelcol_processor_batch_batch_send_size: size of sent batches
- otelcol_process_runtime_total_alloc_bytes: allocated memory (OOM risk)
- otelcol_processor_dropped_spans: spans dropped by processors (data loss)
Conclusions and Next Steps
The OTel Collector is the key architectural component for a scalable and maintainable observability infrastructure. Its pipeline architecture (Receivers, Processors, Exporters) provides total flexibility in managing telemetry data, from filtering sensitive data to routing to multiple backends.
The deployment pattern choice (Agent, Gateway, Sidecar) depends on the organization's specific needs. The most common production pattern combines Agent (DaemonSet) for local pre-processing and Gateway (Deployment) for centralized processing and export.
In the next article, we will explore backend integration: how to configure Jaeger for traces, Prometheus for metrics, and Grafana for visualization dashboards, creating a complete and functional observability stack.







