# OpenTelemetry instrumentation This package produces OpenTelemetry traces for LiteLLM. It is enabled by the `LITELLM_OTEL_V2` environment variable (`is_otel_v2_enabled()` in [`config.py`](./model/config.py)); when unset, nothing in this package runs. ## What gets traced A traced proxy request produces one trace with two kinds of spans: ``` SERVER span "POST /v1/chat/completions" ← FastAPI instrumentation ├── INTERNAL span "auth /v1/chat/completions" ← auth phase ┐ │ ├── CLIENT span "postgres get_key_object" ← datastore call │ │ └── CLIENT span "postgres get_team_membership" │ ├── INTERNAL span "execute_guardrail …" ← guardrail │ this package ├── CLIENT span "chat gpt-4o" ← LLM call │ └── CLIENT span "batch_write_to_db …" ← spend write ┘ ``` The gen-ai spans are siblings under the server span. In particular the guardrail span is a sibling of the LLM call, not a child of it: pre/during/post-call guardrail hooks are part of the request lifecycle (a pre-call guardrail runs before the LLM call even starts), so they belong directly under the server span, alongside the LLM call. Request-level spans (LLM call, guardrail) parent to the server span via an **explicit anchor** — `context.set_request_root_span` captures the server span once at request entry, and `resolve_request_span_context` reads it — rather than to whatever span is momentarily active. Ambient-only parenting was wrong at two boundaries: inside the live `auth` phase span the active span is `auth` (so the span would nest under auth), and a pass-through request closes its span from a detached `asyncio.create_task` where the server span is no longer active (so the span orphaned into its own trace). The anchor — a contextvar inherited by those child tasks — gives a stable parent in both cases. DB/service spans keep ambient parenting so an auth DB lookup still nests under `auth`. **Which service calls become spans (`spans.span_role_for_service`).** LiteLLM's service-logging layer instruments many internal functions, but only some are traceable units of work: - **`DB_CALL` (CLIENT)** — outbound datastore calls (redis, postgres, `batch_write_to_db`), carrying `db.system.name` / `db.operation.name` semconv. - **`SERVICE` (INTERNAL)** — genuine internal work worth a span (background budget/reset jobs, pod-lock manager). - **metrics-only (no span)** — `self` (the `track_llm_api_timing` wrapper, which duplicates the LLM-call span), `router` (duplicates the request), and `proxy_pre_call` (a guardrail's real span is `execute_guardrail …`). These still feed Prometheus/Datadog through their own hooks; they just never enter the trace. `auth` is also excluded here because it gets a **live phase span** instead (see below). Spans are named `"{service} {call_type}"` (e.g. `"redis set"`) so repeated calls to one service stay distinguishable. Like every other span they parent to the **ambient** context, falling back to the threaded `litellm_parent_otel_span` only when ambient has no live span; a background job with neither starts its own root trace. Caller-supplied `event_metadata` is **sanitized** before it reaches a span (primitives only, no live objects, no secrets/headers, bounded) — see `payloads.sanitize_event_metadata`. **Live phase spans.** `auth` is wrapped in a real, active span (`logger.phase_span`) for the duration of authentication, so the DB lookups it triggers nest **under** it instead of flattening onto the server span. Identity Baggage (team/key/user) is seeded once the key resolves, so every post-auth span inherits it; auth-internal DB lookups that run before the key is known stay unlabeled, which is correct. **Status.** On success a span's status is left `UNSET` (the semconv default, matching the FastAPI server span); only a genuine error sets `ERROR`. - **Server spans** (one per HTTP route) are created by the `opentelemetry-instrumentation-fastapi` package. It stamps `http.*` attributes and extracts inbound `traceparent` headers. This package does **not** create or modify server spans — request routes never touch spans. - **Gen-AI spans** (LLM calls, guardrails, internal service calls) are created by this package from LiteLLM's logging callbacks. Request-level spans parent to the server span via the captured anchor; DB/service spans parent to the active span (ambient) so they nest under the request phase that triggered them. Both kinds share a single `TracerProvider`, so they belong to the same trace and export through the same configured exporters. FastAPI middleware can only be added before the app starts serving, so the app is instrumented at import time **without** a provider — it binds to the OTel global `ProxyTracerProvider`. Once config (and the callbacks) is loaded, the proxy publishes the chosen logger's `TracerProvider` as the global via `trace.set_tracer_provider(...)`, and the server spans delegate to it. When a preset callback (`arize`, `langfuse_otel`, …) is configured, its provider becomes the global, so server spans export to that backend too. ## How a request flows 1. **App creation** (`proxy_server` import): when the gate is on, `mount.instrument_fastapi_app(app)` calls `FastAPIInstrumentor.instrument_app` with no provider (the middleware stack is frozen once the app serves, so this can't wait for startup). It binds to the OTel global `ProxyTracerProvider`. Noisy non-LLM routes are excluded by default (`mount._DEFAULT_EXCLUDED_ROUTES`): health checks (`/health*`), the Prometheus scrape (`/metrics`), and static UI/docs assets (`/litellm-asset-prefix`, `/_next`, `/ui`, `/swagger`, `/docs`, `/redoc`, `/openapi.json`, favicons, `/.well-known`) — so load-balancer polling, metric scrapes, and asset fetches don't flood traces. Entries are substring-matched, so `/metrics` also drops the `/model/metrics` admin-analytics spans. Set `OTEL_PYTHON_FASTAPI_EXCLUDED_URLS` to override the whole set (e.g. `""` to trace everything, or your own comma-separated path list). 2. **Startup** (`proxy_server.proxy_startup_event`): after the config (and callbacks) is loaded, the already-registered preset `OpenTelemetryV2` logger is reused — or a generic one reading `OTEL_*` envs is built when no preset is configured — and its `TracerProvider` is published as the OTel global with `trace.set_tracer_provider(...)`. The proxy tracer then delegates to it, so server spans and gen-ai spans share one provider and the same trace. 3. **Request**: the FastAPI instrumentation starts the server span and makes it the active context for the request task. The proxy's first call into the V2 logger (`create_litellm_proxy_request_started_span`, at the auth boundary) **captures it as the request anchor** (`set_request_root_span`), so every later request-level span has a stable explicit parent regardless of what is active when it emits. 4. **LLM call span (born at the boundary)**: `OpenTelemetryV2.log_pre_api_call` runs synchronously in the request task, just before the upstream call, and **opens** the LLM-call span there, parented to the anchored server span (`resolve_request_span_context`). The open span is held in a bounded cache keyed by `litellm_call_id` (a primitive the callback kwargs carry at both `pre_call` and close), so no live `Span` ever travels through a `litellm_params` metadata dict. For the boundary hook to fire at all, the logger is registered into `litellm.input_callback` — the list `Logging.pre_call` iterates. The async success/failure callback later **closes** it: it builds an `LLMCallSpanData` from the typed `standard_logging_object` (token usage and cost are computed only by then), stamps the attributes, sets status, and ends the span. The sync callback is a no-op (closing is async-only). When `pre_call` runs off the request task — a sync-only provider driven through a thread pool, where contextvars (and so the anchor) don't follow — no parent is visible there, so creation is **deferred** to the async callback, whose worker context was copied from the request task at enqueue and so still carries the anchor. **Pass-through** endpoints call `logging_obj.pre_call` in the request task too, then close from a detached `asyncio.create_task`; the anchor (not the by-then-inactive server span) keeps their LLM-call span in the request's trace. `pre_call` is litellm's generic "log the attempt" hook, so it also fires for synthetic proxy-gate error logs (auth/rate-limit rejections); those carry `LITELLM_LOGGING_NO_UPSTREAM_LLM_CALL` and are skipped, so a request rejected before reaching a provider never produces a phantom CLIENT span. 5. **Guardrails / services**: the post-call and service hooks emit guardrail and service spans the same way — typed data → engine → span. Service spans (Redis/Postgres) are dispatched by `litellm/_service_logger.py`, which recognizes the V2 `OpenTelemetryV2` logger (a plain `CustomLogger`, not a subclass of the legacy `OpenTelemetry`). It hands every service call to the logger — including calls with no parent span — and the V2 adapter decides the role (`DB_CALL` vs `SERVICE`), the parent (ambient → threaded → root), and whether the call is a traceable operation or a metrics-only ping. Guardrail span data is built from the typed, provider-agnostic `StandardLoggingGuardrailInformation` — no single provider's field shape is assumed. 6. **Export**: each span ends and is handed to the provider's span processors, which export to the configured backends (OTLP, console, in-memory, …). ## Components ### Sources of truth (`model/`, no OpenTelemetry import) These define the shape of a span without depending on the OTel SDK, so they can be imported anywhere. They live in [`model/`](./model) and form a closed set — nothing here imports outside it: - [`semconv.py`](./model/semconv.py) — attribute-key constants (`gen_ai.*`, `http.*`, `litellm.*`), the GenAI operation/provider enums, and the functions that map LiteLLM provider/call-type strings onto convention values. - [`spans.py`](./model/spans.py) — the span registry: every span role, its OTel span kind, its place in the hierarchy, and its name builder. - [`payloads.py`](./model/payloads.py) — frozen dataclasses (`LLMCallSpanData`, `GuardrailSpanData`, `ServiceSpanData`, …) built from heterogeneous logging payloads via `from_*` classmethods. - [`config.py`](./model/config.py) — `OpenTelemetryV2Config`, a pydantic-settings model that reads `OTEL_*` / `LITELLM_OTEL_*` env vars, plus the feature gate. `capture_span_content` gates whether prompt/response bodies may be written as span attributes; it defaults **off** (`no_content`). The Baggage allowlists are configurable, not hard-coded: set `LITELLM_OTEL_BAGGAGE_PROMOTED_KEYS` / `LITELLM_OTEL_BAGGAGE_METADATA_KEYS` / `LITELLM_OTEL_BAGGAGE_TEAM_METADATA_KEYS` (comma-separated) as env vars, or `baggage_promoted_keys` / `baggage_metadata_keys` / `baggage_team_metadata_keys` (YAML lists) under `callback_settings.otel` in `config.yaml` — the latter reach the config through the logger's constructor kwargs. `baggage_team_metadata_keys` is empty by default, so none of a team's free-form metadata is promoted until each sub-key is explicitly allowlisted. - [`baggage.py`](./model/baggage.py) — the single definition of which request-identity values are promoted into Baggage (so child spans inherit them) and under which attribute keys. - [`utils.py`](./model/utils.py) — value coercion, JSON serialization, and extractor-table application, shared across the package. ### Engine - [`emitter.py`](./emitter.py) — `SpanEmitter.emit(role, data)`: dedupe → start the span → run the mapper chain to stamp attributes → set status → end. It owns no attribute keys. The dedupe set (which coalesces the sync+async firing of one request) is a bounded LRU so it can't grow without limit. - [`mappers/`](./mappers) — each mapper turns typed span data into a flat `{attribute key: value}` dict. They compose: listing several mapper names in the config layers multiple attribute vocabularies onto the same span. - `genai` — the canonical OpenTelemetry GenAI vocabulary, always present. - `legacy` — an additional vocabulary using the older semconv-ai / Traceloop attribute key names, for backends that read those. - `openinference`, `langfuse`, `weave`, `langtrace` — vendor vocabularies. - `resolve_mappers(names)` turns config names into mapper instances. ### Plumbing (`plumbing/`) The OTel-SDK wiring. Everything here imports only `model/` and each other; it lives in [`plumbing/`](./plumbing): - [`providers.py`](./plumbing/providers.py) — builds the `TracerProvider`, its exporters (from `ExporterSpec`s), and the span processor that copies allowlisted Baggage entries onto every span. `register_exporter_factory(kind, factory)` lets a preset contribute a custom exporter `kind` (e.g. one that fetches an auth token lazily) without coupling this module to any vendor. - [`context.py`](./plumbing/context.py) — trace-context and Baggage read/write helpers. - [`routing.py`](./plumbing/routing.py) — `TenantTracerCache`: when a request carries team/key-scoped vendor credentials, route its spans through a credential-keyed `TracerProvider` so one logger serves many tenants. The cache is a bounded LRU that flushes + shuts down evicted providers, since the key derives from request-supplied credentials and must not grow (or leak threads) without limit. - [`metrics.py`](./plumbing/metrics.py) — GenAI client metric instruments. The six `gen_ai.client.*` histograms are recorded through the meter resolved by `providers.resolve_meter_provider`: an injected provider wins (tests/DI), otherwise the operator's globally configured `MeterProvider` is reused so its readers/exporters receive them alongside the server metrics, and one is built and registered as the global only when none is set (mirroring how V2 owns trace export). ### Adapter - [`logger.py`](./logger.py) — `OpenTelemetryV2`, a `CustomLogger` that translates LiteLLM's logging callbacks into typed span data and hands them to the engine. The LLM-call span is opened at the `log_pre_api_call` boundary (parented to the live server span via ambient context) and closed at the async success/failure callback; the open span is held in a bounded cache keyed by `litellm_call_id`, never threaded through a metadata dict. The logger registers itself into `litellm.input_callback` so `Logging.pre_call` fires the boundary hook. - [`mount.py`](./mount.py) — `instrument_fastapi_app(app)`, the single call site that attaches `opentelemetry-instrumentation-fastapi` for SERVER spans. It owns the health-check exclusion default (`OTEL_PYTHON_FASTAPI_EXCLUDED_URLS`) and the passthrough span-naming hook (`PASSTHROUGH_PREFIXES`) so `proxy_server` carries no OTel detail. A safe no-op when the gate is off or the instrumentation package is absent; must be called at app-creation time (the middleware stack freezes once the app serves). ### Presets - [`presets/`](./presets) — each preset reads one integration's env vars and returns an `OpenTelemetryV2Config` (exporter destination + mapper vocabularies + resource attributes). `PRESET_BY_CALLBACK` maps a callback name (`"arize"`, `"langfuse_otel"`, …) to its preset. Integrations that support team/key-scoped credentials also provide a per-request OTLP header builder (`DYNAMIC_HEADERS_BY_CALLBACK`). Presets do **no** network I/O at build time: AgentOps, for example, mints its JWT lazily inside a custom exporter on the first export (in the `BatchSpanProcessor` worker thread), never on the event loop. ## Extending - **A new attribute vocabulary for a backend**: add a mapper in `mappers/` (a class with a `map(data) -> AttributeMap` method, typically built from `key -> extractor` tables) and register it in `mappers/__init__._MAPPER_BY_NAME`. - **A new integration**: add a preset in `presets/` that returns an `OpenTelemetryV2Config`, and register it in `presets/__init__.PRESET_BY_CALLBACK`. If it supports dynamic credentials, add a header builder to `DYNAMIC_HEADERS_BY_CALLBACK`. - **A new span kind**: add a role to `spans.py` (registry entry + name builder), a payload dataclass in `payloads.py`, and a branch in the relevant mapper(s).