eBPF Instrumentation: Kernel-Level Observability
eBPF (extended Berkeley Packet Filter) is a Linux kernel technology that allows running sandboxed programs within the kernel, without modifying kernel source code or loading kernel modules. Applied to observability, eBPF enables distributed traces, performance metrics, and network visibility without modifying application code and without agents.
While OpenTelemetry auto-instrumentation operates at the application level (intercepting library calls), eBPF operates at the kernel level, intercepting syscalls, network packets, and file system operations. This approach offers lower overhead and broader coverage, including services that cannot be traditionally instrumented.
What You Will Learn in This Article
- eBPF fundamentals and how it works in the Linux kernel
- How eBPF enables zero-instrumentation observability
- Pixie: eBPF-based Kubernetes observability
- Cilium and Hubble: networking observability with eBPF
- Comparison between eBPF and traditional auto-instrumentation
- Current limitations and adoption scenarios
How eBPF Works
eBPF allows loading small programs into the Linux kernel that execute in response to specific events: syscalls, network packet arrival, process context switches, file system operations. These programs are verified by the kernel to ensure safety (no crashes, no infinite loops) and are compiled into efficient bytecode.
eBPF Architecture for Observability
Application executes a syscall (e.g., connect(), write()) →
Kernel triggers the associated eBPF probe →
eBPF program captures metadata (PID, timestamp, parameters) →
User-space collector reads data from the eBPF buffer →
OTel Collector receives and exports to the backend
eBPF Probe Types
eBPF programs can be attached to different kernel points, each useful for a different aspect of observability:
// Conceptual example: eBPF probe to capture TCP connections
// This program attaches to the connect() syscall
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
struct connection_event {
__u32 pid;
__u32 tid;
__u64 timestamp_ns;
__u32 src_addr;
__u32 dst_addr;
__u16 dst_port;
char comm[16]; // process name
};
// Buffer to send events to user-space
struct {
__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
__uint(key_size, sizeof(int));
__uint(value_size, sizeof(int));
} events SEC(".maps");
// Probe attached to tcp_connect
SEC("kprobe/tcp_connect")
int trace_tcp_connect(struct pt_regs *ctx) {
struct connection_event event = {};
event.pid = bpf_get_current_pid_tgid() >> 32;
event.tid = bpf_get_current_pid_tgid();
event.timestamp_ns = bpf_ktime_get_ns();
bpf_get_current_comm(&event.comm, sizeof(event.comm));
// Extract destination address and port
// ... (access kernel structures)
// Send event to user-space
bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU,
&event, sizeof(event));
return 0;
}
Probe Types for Observability
| Probe Type | Attachment Point | Observability Use |
|---|---|---|
| kprobe/kretprobe | Kernel functions | Tracing syscalls (connect, read, write) |
| uprobe/uretprobe | User-space functions | Tracing library calls (SSL, HTTP parser) |
| tracepoint | Stable kernel points | Scheduling, I/O, networking |
| XDP | Network driver (pre-stack) | High-performance traffic analysis |
| TC (Traffic Control) | Network stack | L3/L4 traffic monitoring |
Pixie: Kubernetes Observability with eBPF
Pixie (now part of the CNCF project) is a Kubernetes observability platform that uses eBPF to provide automatic visibility into applications, networking, and infrastructure without requiring code changes, application agents, or sidecars.
# Install Pixie on a Kubernetes cluster
# Prerequisites: Kubernetes 1.21+, Linux kernel 4.14+
# 1. Install Pixie CLI
bash -c "$(curl -fsSL https://withpixie.ai/install.sh)"
# 2. Deploy Pixie to the cluster
px deploy
# 3. Verify status
px get viziers
# 4. Run queries from terminal
# View HTTP connections between services
px run px/http_data
# Latency per endpoint
px run px/service_stats
# DNS traffic
px run px/dns_data
# CPU profiling (flame graph)
px run px/perf_flamegraph -- --pod=order-service
Pixie automatically captures HTTP, gRPC, MySQL, PostgreSQL, Redis, Kafka, and DNS traffic at the kernel level, reconstructing requests and calculating latencies without any application instrumentation.
Cilium and Hubble: Network Observability
Cilium is a CNCF project that uses eBPF for networking and security in Kubernetes. Hubble, Cilium's observability component, provides complete visibility into network traffic between pods, including L3/L4/L7 metrics, network policies, and automatic service maps.
# Install Cilium with Hubble enabled
# helm install cilium cilium/cilium --version 1.15.0
# Cilium configuration with Hubble
apiVersion: v1
kind: ConfigMap
metadata:
name: cilium-config
namespace: kube-system
data:
# Enable Hubble for observability
enable-hubble: "true"
hubble-listen-address: ":4244"
hubble-metrics-server: ":9965"
# L7 HTTP metrics
hubble-metrics: >-
dns
drop
tcp
flow
icmp
http
httpV2:sourceContext=workload-name|reserved-identity;destinationContext=workload-name|reserved-identity
# Export to OpenTelemetry
hubble-export-file-max-size-mb: "10"
hubble-export-file-max-backups: "5"
# Enable relay for UI
hubble-relay-enabled: "true"
hubble-ui-enabled: "true"
# Hubble commands for network observability
# Observe traffic in real time
hubble observe --namespace ecommerce
# Filter by HTTP protocol
hubble observe --namespace ecommerce --protocol http
# View only errors (4xx, 5xx)
hubble observe --namespace ecommerce --http-status 400-599
# Traffic between specific services
hubble observe --from-pod ecommerce/order-service \
--to-pod ecommerce/payment-service
# Metrics for Prometheus
# Hubble metrics are exposed on :9965/metrics
# and can be scraped by Prometheus
eBPF vs Auto-Instrumentation: Comparison
eBPF and auto-instrumentation are complementary approaches, not alternatives. Each has specific strengths and limitations that make them suitable for different scenarios.
Detailed Comparison
| Aspect | eBPF | OTel Auto-Instrumentation |
|---|---|---|
| Code changes | None | None (agent/JVM flag) |
| Overhead | Very low (kernel-level) | Low-medium (user-space) |
| Protocol coverage | HTTP, gRPC, DNS, SQL, Redis | 100+ libraries per language |
| Custom attributes | Limited (network data only) | Complete (spans, attributes, events) |
| Business context | Not available | Available with manual SDK |
| OS support | Linux only (kernel 4.14+) | Cross-platform |
| Application language | Agnostic (any language) | Language-specific |
When to Use eBPF for Observability
- Network visibility: monitor L3/L4/L7 traffic between all pods in a Kubernetes cluster
- Legacy services: get traces from applications that cannot be instrumented (binaries, C/C++)
- Security monitoring: detect anomalous connections, DNS tunneling, lateral movement
- Performance profiling: CPU flame graphs, I/O profiling without significant overhead
- Complement to OTel: use eBPF for networking and OTel for application context
Current eBPF Limitations
Despite its revolutionary potential, eBPF has important limitations that affect adoption:
- Linux only: eBPF is a Linux kernel technology, not available on Windows or macOS (in production)
- Kernel version: requires kernel 4.14+ (5.8+ for advanced features), limiting legacy systems
- TLS traffic: eBPF sees data after decryption only with uprobes on SSL libraries, with additional complexity
- No business context: eBPF sees packets and syscalls, not order IDs or customer tiers
- Development complexity: writing custom eBPF programs requires kernel knowledge
Conclusions and Next Steps
eBPF represents the future of low-overhead observability. The ability to obtain distributed traces, network metrics, and profiling without modifying application code is revolutionary, especially for Kubernetes clusters with hundreds of services.
The optimal approach combines eBPF (for network and infrastructure visibility) with OpenTelemetry (for application and business context). Tools like Pixie and Cilium/Hubble make eBPF accessible even to teams without kernel experience.
In the next article, we will explore sampling strategies, analyzing how to reduce telemetry volume and associated costs without losing visibility into critical events.







