OpenTelemetry: The Standard for Modern Telemetry
OpenTelemetry (OTel) is an open source, vendor-neutral framework for generating, collecting, and exporting telemetry data. Born from the merger of two previous projects (OpenTracing and OpenCensus), OTel is today the second most active project in the Cloud Native Computing Foundation (CNCF) after Kubernetes, with over 1,000 contributors and support from all major observability vendors.
OTel is not an observability backend: it does not store data and does not provide dashboards. It is an instrumentation standard that defines how to collect metrics, logs, and traces from application code and send them to any compatible backend (Jaeger, Prometheus, Datadog, New Relic, Grafana Cloud, and many others).
This vendor-neutral approach solves a critical problem: you instrument your code once with OTel APIs, and you can switch observability backends without touching a single line of application code.
What You Will Learn in This Article
- The OpenTelemetry architecture: API, SDK, Collector, and OTLP
- The fundamental distinction between API and SDK
- The three OTel signals: Traces, Metrics, and Logs
- Semantic Conventions and why they matter
- The OTLP (OpenTelemetry Protocol) protocol
- The maturity matrix and the state of each component
OpenTelemetry Architecture
The OTel architecture consists of four main components that collaborate to provide a complete telemetry pipeline, from code instrumentation to export to storage and visualization backends.
1. API: The Instrumentation Contract
The API is the abstraction layer that application code uses to generate telemetry. It defines interfaces and types without concrete implementation. If you install only the API without an SDK, all calls become no-op (empty operations), guaranteeing zero overhead in environments where telemetry is not needed.
This separation is fundamental for libraries: a library instrumented with the OTel API does not force the consuming application to adopt a specific SDK. The application decides whether and how to collect telemetry by registering an SDK.
2. SDK: The Concrete Implementation
The SDK implements the API interfaces with concrete collection, processing, and export logic. The SDK manages:
- Sampling: deciding which traces to collect and which to discard
- Batching: grouping telemetry data for efficient sending
- Resource detection: automatically identifying the environment (host, container, cloud)
- Export: sending data to the backend through configured Exporters
# Complete OpenTelemetry SDK setup in Python
from opentelemetry import trace, metrics
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.sdk.resources import Resource
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
# 1. Define the Resource (service identity)
resource = Resource.create({
"service.name": "order-service",
"service.version": "1.2.0",
"deployment.environment": "production",
"service.instance.id": "order-service-pod-abc123"
})
# 2. Configure TracerProvider with OTLP exporter
tracer_provider = TracerProvider(resource=resource)
otlp_trace_exporter = OTLPSpanExporter(
endpoint="http://otel-collector:4317",
insecure=True
)
tracer_provider.add_span_processor(
BatchSpanProcessor(otlp_trace_exporter)
)
trace.set_tracer_provider(tracer_provider)
# 3. Configure MeterProvider with OTLP exporter
otlp_metric_exporter = OTLPMetricExporter(
endpoint="http://otel-collector:4317",
insecure=True
)
metric_reader = PeriodicExportingMetricReader(
otlp_metric_exporter,
export_interval_millis=60000
)
meter_provider = MeterProvider(
resource=resource,
metric_readers=[metric_reader]
)
metrics.set_meter_provider(meter_provider)
3. Collector: The Telemetry Router
The OTel Collector is a standalone component that receives, processes, and exports telemetry data. It functions as an intelligent proxy between applications and backends, offering batching, retry, filtering, and data transformation capabilities.
The Collector is optional: applications can export directly to backends. However, the Collector is strongly recommended in production to decouple applications from backend configuration and centralize telemetry management.
4. OTLP: The Transport Protocol
OTLP (OpenTelemetry Protocol) is the native protocol for transporting telemetry in OTel. It supports three transport modes: gRPC (default, high performance), HTTP/protobuf (firewall compatibility), and HTTP/JSON (debug and testing). All three transport the same data with different encodings.
OTel Architecture: Data Flow
Application (API + SDK) → Exporter (OTLP) → Collector (Receivers → Processors → Exporters) → Backend (Jaeger, Prometheus, Grafana Cloud, etc.)
The Three OpenTelemetry Signals
OTel supports the three types of telemetry signals, each at a different maturity level:
Traces (Stable)
The most mature OTel signal. Distributed traces represent the path of a request through services, composed of spans with attributes, events, and links. The tracing API is stable across all major languages.
Metrics (Stable)
OTel metrics support three instrument types: Counter, Histogram, and Gauge. The metrics API is stable and supports configurable aggregations, temporality (cumulative and delta), and controlled cardinality through Views.
Logs (Stable)
Unlike traces and metrics, OTel does not introduce a new logging API. Instead, it provides a Log Bridge API that connects existing logging frameworks (Log4j, SLF4J, Python logging, Winston) to the OTel pipeline, enriching logs with trace context automatically.
OTel Signal Maturity Matrix
| Signal | API | SDK | OTLP | Auto-Instrumentation |
|---|---|---|---|---|
| Traces | Stable | Stable | Stable | Stable (Java, Python, .NET) |
| Metrics | Stable | Stable | Stable | Stable (Java, .NET) |
| Logs | Stable | Stable | Stable | Stable (Java) |
Semantic Conventions: The Shared Vocabulary
Semantic Conventions are a standardized set of names for attributes, metrics, and spans that ensure interoperability between different libraries, frameworks, and backends. Without semantic conventions, every team would use different names for the same concepts, making cross-service correlation and analysis impossible.
# Examples of Semantic Conventions for HTTP
# All HTTP libraries use the same attribute names
# HTTP server span attributes
span.set_attribute("http.request.method", "POST")
span.set_attribute("url.path", "/api/v1/orders")
span.set_attribute("http.response.status_code", 201)
span.set_attribute("server.address", "api.example.com")
span.set_attribute("server.port", 443)
span.set_attribute("network.protocol.version", "1.1")
span.set_attribute("http.route", "/api/v1/orders")
# Database span attributes
span.set_attribute("db.system", "postgresql")
span.set_attribute("db.namespace", "orders_db")
span.set_attribute("db.operation.name", "SELECT")
span.set_attribute("db.query.text", "SELECT * FROM orders WHERE id = ?")
span.set_attribute("server.address", "db.internal")
span.set_attribute("server.port", 5432)
# Messaging span attributes
span.set_attribute("messaging.system", "kafka")
span.set_attribute("messaging.destination.name", "order-events")
span.set_attribute("messaging.operation.type", "publish")
span.set_attribute("messaging.message.id", "msg-12345")
Semantic Conventions cover domains such as HTTP, database, messaging, RPC, cloud providers, container runtime, and many others. Following them ensures that backends can correctly interpret telemetry and provide preconfigured dashboards and analytics.
Language SDKs: Multi-Language Ecosystem
OTel provides official SDKs for all major programming languages. Each SDK implements the same APIs with language-specific idioms, ensuring a native experience:
// OTel SDK Setup in Java with Spring Boot
// build.gradle.kts
// implementation("io.opentelemetry:opentelemetry-api:1.36.0")
// implementation("io.opentelemetry:opentelemetry-sdk:1.36.0")
// implementation("io.opentelemetry:opentelemetry-exporter-otlp:1.36.0")
import io.opentelemetry.api.OpenTelemetry;
import io.opentelemetry.api.trace.Tracer;
import io.opentelemetry.api.trace.Span;
import io.opentelemetry.api.metrics.Meter;
import io.opentelemetry.api.metrics.LongCounter;
import io.opentelemetry.context.Scope;
public class OrderService {
private final Tracer tracer;
private final LongCounter orderCounter;
public OrderService(OpenTelemetry openTelemetry) {
this.tracer = openTelemetry.getTracer("order-service", "1.0.0");
Meter meter = openTelemetry.getMeter("order-service", "1.0.0");
this.orderCounter = meter.counterBuilder("orders.created")
.setDescription("Number of orders created")
.setUnit("1")
.build();
}
public Order createOrder(OrderRequest request) {
Span span = tracer.spanBuilder("create-order")
.setAttribute("order.type", request.getType())
.setAttribute("order.items_count", request.getItems().size())
.startSpan();
try (Scope scope = span.makeCurrent()) {
Order order = processOrder(request);
orderCounter.add(1);
span.setAttribute("order.id", order.getId());
return order;
} catch (Exception e) {
span.recordException(e);
span.setStatus(StatusCode.ERROR, e.getMessage());
throw e;
} finally {
span.end();
}
}
}
Officially Supported SDKs
- Java: Mature SDK with auto-instrumentation agent (the most complete)
- Python: Stable SDK, auto-instrumentation for Django, Flask, FastAPI
- Go: Stable SDK, idiomatic design with native context propagation
- .NET: Stable SDK, deep integration with ASP.NET Core
- JavaScript/Node.js: Stable SDK, auto-instrumentation for Express, Fastify
- Rust: SDK in maturation phase, active community
- C++: Stable SDK, used in high-performance scenarios
- Swift: Emerging SDK, for iOS and macOS applications
Resource: The Service Identity
The Resource is a set of attributes that identify the entity producing telemetry. It is shared across all signals (traces, metrics, logs) and provides the environmental context needed to correlate data. The Resource is set once at SDK startup and remains constant for the entire process duration.
# Typical Resource attributes in a Kubernetes environment
resource:
attributes:
# Service identity
service.name: "order-service"
service.version: "2.1.0"
service.namespace: "ecommerce"
service.instance.id: "order-service-7d8f9b-xk2mp"
# Deployment environment
deployment.environment: "production"
deployment.region: "eu-west-1"
# Kubernetes metadata
k8s.namespace.name: "ecommerce-prod"
k8s.pod.name: "order-service-7d8f9b-xk2mp"
k8s.deployment.name: "order-service"
k8s.node.name: "worker-node-03"
# Cloud provider
cloud.provider: "aws"
cloud.region: "eu-west-1"
cloud.availability_zone: "eu-west-1a"
Adoption Roadmap: From Zero to Production
Adopting OpenTelemetry in an organization is an incremental process. You do not need to instrument all code from day one. A realistic roadmap involves three phases:
OTel Adoption Roadmap
Phase 1 (Week 1-2): Deploy the Collector, auto-instrumentation on core services,
backend configuration (Jaeger + Prometheus). Result: basic visibility on traces and metrics.
Phase 2 (Week 3-6): Add custom attributes on critical spans, configure sampling,
integrate log-trace correlation. Result: effective debugging for main flows.
Phase 3 (Month 2-3): Manual instrumentation for business metrics, custom Grafana
dashboards, SLO-based alerting. Result: production-grade observability.
Conclusions and Next Steps
OpenTelemetry provides a modular, vendor-neutral architecture that clearly separates instrumentation (API), collection (SDK), routing (Collector), and export (OTLP). Semantic Conventions ensure interoperability, while multi-language support allows adopting OTel in any technology stack.
The API/SDK separation is the key architectural choice: libraries use the API (zero overhead without SDK), applications configure the SDK with appropriate exporters. This design allows instrumenting the entire ecosystem without creating forced dependencies.
In the next article, we will dive deep into distributed tracing, analyzing in detail spans, trace context, the W3C Trace Context protocol, and the parent-child relationships that form the graph of a distributed trace.







