Infrastructure

Tracing LLM Applications with OpenTelemetry

Balys Kriksciunas 5 min read
#ai#infrastructure#observability#opentelemetry#tracing#llm#monitoring#langfuse

Tracing LLM Applications with OpenTelemetry

The observability story for LLMs was “here’s another proprietary SDK” for most of 2023. Every vendor — LangSmith, Langfuse, Helicone, Phoenix — shipped its own instrumentation library. You picked one and hoped the company would still exist in two years.

OpenTelemetry changes this. In 2024, OTel’s GenAI semantic conventions stabilized, the auto-instrumentation libraries matured, and every observability vendor that matters added OTel ingestion. You can now instrument your LLM app once with OTel and export to Langfuse, Datadog, Honeycomb, or a self-hosted Tempo + Grafana stack without touching your code.

This guide walks through how to do it.


The Shape of the Problem

A modern LLM app doesn’t look like an HTTP service. A single user request typically involves:

Any of those can fail, be slow, or return a subtly-wrong answer. You need distributed tracing — linked spans that show the full causal chain for a user request — just as much as you would for a microservices stack.

OTel was already the standard for microservice tracing. Extending it to LLM workloads gives you one observability stack, not two.


OTel GenAI Semantic Conventions

Semantic conventions are the contract: what span attributes everyone agrees to use. The GenAI conventions define attributes like:

Plus span-kind conventions for “chat”, “embeddings”, “image generation”, and so on. Full spec lives at opentelemetry.io/docs/specs/semconv/gen-ai/.

Following these conventions means any OTel-compatible backend can display your LLM traces correctly — with token counts, model identifiers, cost breakdowns — without vendor-specific parsing.


Instrumentation: Three Approaches

1. Auto-instrumentation

The easiest path. Install an auto-instrumentation library and it monkey-patches common clients.

Python:

from opentelemetry.instrumentation.openai import OpenAIInstrumentor
OpenAIInstrumentor().instrument()

from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)
# ^ automatically creates a span with GenAI attributes

Libraries with auto-instrumentation in late 2024:

TraceLoop and Arize Phoenix both maintain extensive OTel-compatible instrumentation bundles (openllmetry and openinference respectively), well worth using.

2. Manual instrumentation

When auto-instrumentation doesn’t cover what you need, add spans explicitly:

from opentelemetry import trace

tracer = trace.get_tracer(__name__)

def retrieve_and_rerank(query: str):
    with tracer.start_as_current_span("retrieval") as span:
        span.set_attribute("retrieval.query", query)
        span.set_attribute("retrieval.system", "qdrant")
        hits = vector_db.search(query, k=50)
        span.set_attribute("retrieval.hits_count", len(hits))

    with tracer.start_as_current_span("rerank") as span:
        span.set_attribute("rerank.model", "bge-reranker-v2-m3")
        reranked = reranker.rerank(query, hits)[:5]

    return reranked

Your agent’s think, act, observe phases should each be spans. Tool calls should be spans. Prompt assembly should be a span. Over-instrument early; you can lower sampling later.

3. Framework-native integration

LangChain, LlamaIndex, and DSPy all ship OTel exporters. Turn them on and you get traces for free:

# LangChain with OpenLLMetry
from traceloop.sdk import Traceloop
Traceloop.init(app_name="my-agent", api_endpoint="http://otel-collector:4318")

The Collector Pattern

Don’t send OTel data directly from your app to your backend. Use an OTel Collector as a gateway:

[App] → [OTel Collector] → [Backend(s)]

The Collector handles:

A minimal collector config:

receivers:
  otlp:
    protocols: {grpc: {}, http: {}}

processors:
  batch: {timeout: 5s}
  attributes/pii_scrub:
    actions:
      - key: gen_ai.prompt
        action: hash  # don't log raw prompts in production

exporters:
  otlp/langfuse:
    endpoint: "cloud.langfuse.com:443"
  otlp/datadog:
    endpoint: "trace.agent.datadoghq.com:443"

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch, attributes/pii_scrub]
      exporters: [otlp/langfuse, otlp/datadog]

What to Capture on Each Span

Chat completion span:

Retrieval span:

Tool call span:

Agent / graph span (parent of the above):


Sampling in Production

At scale, you cannot afford to store every trace forever. Typical sampling strategies:

For LLM workloads, always keep traces that involved tool errors, user feedback (thumb-down), or unusually long/expensive calls. Those are the ones you’ll want to debug.


Where to Send It

OTel-compatible backends for LLM workloads:

Typical pattern: Langfuse for the LLM-specific view; a general-purpose APM for the everything-else view. Collector routes to both.


Instrumentation Anti-Patterns

Things we’ve seen go wrong:

1. Logging raw prompts at INFO. Prompts contain PII, internal docs, user queries. Hash or redact them in production. Use SENSITIVE log levels for raw content.

2. No trace context propagation across async boundaries. When a Celery worker or a queue consumer handles a request, pass the trace context explicitly. OTel has inject/extract utilities for this.

3. Over-cardinality attributes. Putting user_id as a span attribute is fine. Putting it in a metric label explodes your metrics backend.

4. Sampling too aggressively. 1% sampling on a 10 req/s service is 6 traces/minute. Bugs slip through. Use tail sampling to keep error traces at 100%.

5. Synchronous span export. OTel exporters should be async by default. Make sure yours are; slow exporter means slow requests.


A Starting Template

Minimum viable OTel for a Python LLM app:

# otel_setup.py
from opentelemetry import trace
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter

from opentelemetry.instrumentation.openai import OpenAIInstrumentor
from opentelemetry.instrumentation.httpx import HTTPXClientInstrumentor

resource = Resource.create({
    "service.name": "my-agent",
    "service.version": "1.2.3",
    "deployment.environment": "production",
})

provider = TracerProvider(resource=resource)
provider.add_span_processor(BatchSpanProcessor(
    OTLPSpanExporter(endpoint="http://otel-collector:4318/v1/traces")
))
trace.set_tracer_provider(provider)

OpenAIInstrumentor().instrument()
HTTPXClientInstrumentor().instrument()

Import this once at app startup, and your OpenAI calls, HTTP calls, and any manual spans you add will all flow to the collector.


Further Reading

Standing up observability for your LLM stack? We can help — from collector topology to eval frameworks.

← Back to Blog