- health_checklist.json: 192.168.1.122→node122
- ocr_client.py: docstring IP→node122
- docs/market-data-requirements.md: IP→node122
- 所有API调用通过ProxyHandler({})绕过系统代理
Privoxy对node122:18003返回500,直连正常
16 KiB
OpenTelemetry instrumentation
This package produces OpenTelemetry traces for LiteLLM. It is enabled by the
LITELLM_OTEL_V2 environment variable (is_otel_v2_enabled() in
config.py); when unset, nothing in this package runs.
What gets traced
A traced proxy request produces one trace with two kinds of spans:
SERVER span "POST /v1/chat/completions" ← FastAPI instrumentation
├── INTERNAL span "auth /v1/chat/completions" ← auth phase ┐
│ ├── CLIENT span "postgres get_key_object" ← datastore call │
│ └── CLIENT span "postgres get_team_membership" │
├── INTERNAL span "execute_guardrail …" ← guardrail │ this package
├── CLIENT span "chat gpt-4o" ← LLM call │
└── CLIENT span "batch_write_to_db …" ← spend write ┘
The gen-ai spans are siblings under the server span. In particular the guardrail span is a sibling of the LLM call, not a child of it: pre/during/post-call guardrail hooks are part of the request lifecycle (a pre-call guardrail runs before the LLM call even starts), so they belong directly under the server span, alongside the LLM call.
Request-level spans (LLM call, guardrail) parent to the server span via an
explicit anchor — context.set_request_root_span captures the server span
once at request entry, and resolve_request_span_context reads it — rather than
to whatever span is momentarily active. Ambient-only parenting was wrong at two
boundaries: inside the live auth phase span the active span is auth (so the
span would nest under auth), and a pass-through request closes its span from a
detached asyncio.create_task where the server span is no longer active (so the
span orphaned into its own trace). The anchor — a contextvar inherited by those
child tasks — gives a stable parent in both cases. DB/service spans keep ambient
parenting so an auth DB lookup still nests under auth.
Which service calls become spans (spans.span_role_for_service). LiteLLM's
service-logging layer instruments many internal functions, but only some are
traceable units of work:
DB_CALL(CLIENT) — outbound datastore calls (redis, postgres,batch_write_to_db), carryingdb.system.name/db.operation.namesemconv.SERVICE(INTERNAL) — genuine internal work worth a span (background budget/reset jobs, pod-lock manager).- metrics-only (no span) —
self(thetrack_llm_api_timingwrapper, which duplicates the LLM-call span),router(duplicates the request), andproxy_pre_call(a guardrail's real span isexecute_guardrail …). These still feed Prometheus/Datadog through their own hooks; they just never enter the trace.authis also excluded here because it gets a live phase span instead (see below).
Spans are named "{service} {call_type}" (e.g. "redis set") so repeated calls
to one service stay distinguishable. Like every other span they parent to the
ambient context, falling back to the threaded litellm_parent_otel_span only
when ambient has no live span; a background job with neither starts its own root
trace. Caller-supplied event_metadata is sanitized before it reaches a span
(primitives only, no live objects, no secrets/headers, bounded) — see
payloads.sanitize_event_metadata.
Live phase spans. auth is wrapped in a real, active span
(logger.phase_span) for the duration of authentication, so the DB lookups it
triggers nest under it instead of flattening onto the server span. Identity
Baggage (team/key/user) is seeded once the key resolves, so every post-auth span
inherits it; auth-internal DB lookups that run before the key is known stay
unlabeled, which is correct.
Status. On success a span's status is left UNSET (the semconv default,
matching the FastAPI server span); only a genuine error sets ERROR.
- Server spans (one per HTTP route) are created by the
opentelemetry-instrumentation-fastapipackage. It stampshttp.*attributes and extracts inboundtraceparentheaders. This package does not create or modify server spans — request routes never touch spans. - Gen-AI spans (LLM calls, guardrails, internal service calls) are created by this package from LiteLLM's logging callbacks. Request-level spans parent to the server span via the captured anchor; DB/service spans parent to the active span (ambient) so they nest under the request phase that triggered them.
Both kinds share a single TracerProvider, so they belong to the same trace
and export through the same configured exporters. FastAPI middleware can only be
added before the app starts serving, so the app is instrumented at
import time without a provider — it binds to the OTel global
ProxyTracerProvider. Once config (and the callbacks) is loaded, the proxy
publishes the chosen logger's TracerProvider as the global via
trace.set_tracer_provider(...), and the server spans delegate to it. When a
preset callback (arize, langfuse_otel, …) is configured, its provider
becomes the global, so server spans export to that backend too.
How a request flows
- App creation (
proxy_serverimport): when the gate is on,mount.instrument_fastapi_app(app)callsFastAPIInstrumentor.instrument_appwith no provider (the middleware stack is frozen once the app serves, so this can't wait for startup). It binds to the OTel globalProxyTracerProvider. Noisy non-LLM routes are excluded by default (mount._DEFAULT_EXCLUDED_ROUTES): health checks (/health*), the Prometheus scrape (/metrics), and static UI/docs assets (/litellm-asset-prefix,/_next,/ui,/swagger,/docs,/redoc,/openapi.json, favicons,/.well-known) — so load-balancer polling, metric scrapes, and asset fetches don't flood traces. Entries are substring-matched, so/metricsalso drops the/model/metricsadmin-analytics spans. SetOTEL_PYTHON_FASTAPI_EXCLUDED_URLSto override the whole set (e.g.""to trace everything, or your own comma-separated path list). - Startup (
proxy_server.proxy_startup_event): after the config (and callbacks) is loaded, the already-registered presetOpenTelemetryV2logger is reused — or a generic one readingOTEL_*envs is built when no preset is configured — and itsTracerProvideris published as the OTel global withtrace.set_tracer_provider(...). The proxy tracer then delegates to it, so server spans and gen-ai spans share one provider and the same trace. - Request: the FastAPI instrumentation starts the server span and makes it
the active context for the request task. The proxy's first call into the V2
logger (
create_litellm_proxy_request_started_span, at the auth boundary) captures it as the request anchor (set_request_root_span), so every later request-level span has a stable explicit parent regardless of what is active when it emits. - LLM call span (born at the boundary):
OpenTelemetryV2.log_pre_api_callruns synchronously in the request task, just before the upstream call, and opens the LLM-call span there, parented to the anchored server span (resolve_request_span_context). The open span is held in a bounded cache keyed bylitellm_call_id(a primitive the callback kwargs carry at bothpre_calland close), so no liveSpanever travels through alitellm_paramsmetadata dict. For the boundary hook to fire at all, the logger is registered intolitellm.input_callback— the listLogging.pre_calliterates. The async success/failure callback later closes it: it builds anLLMCallSpanDatafrom the typedstandard_logging_object(token usage and cost are computed only by then), stamps the attributes, sets status, and ends the span. The sync callback is a no-op (closing is async-only). Whenpre_callruns off the request task — a sync-only provider driven through a thread pool, where contextvars (and so the anchor) don't follow — no parent is visible there, so creation is deferred to the async callback, whose worker context was copied from the request task at enqueue and so still carries the anchor. Pass-through endpoints calllogging_obj.pre_callin the request task too, then close from a detachedasyncio.create_task; the anchor (not the by-then-inactive server span) keeps their LLM-call span in the request's trace.pre_callis litellm's generic "log the attempt" hook, so it also fires for synthetic proxy-gate error logs (auth/rate-limit rejections); those carryLITELLM_LOGGING_NO_UPSTREAM_LLM_CALLand are skipped, so a request rejected before reaching a provider never produces a phantom CLIENT span. - Guardrails / services: the post-call and service hooks emit guardrail and
service spans the same way — typed data → engine → span. Service spans
(Redis/Postgres) are dispatched by
litellm/_service_logger.py, which recognizes the V2OpenTelemetryV2logger (a plainCustomLogger, not a subclass of the legacyOpenTelemetry). It hands every service call to the logger — including calls with no parent span — and the V2 adapter decides the role (DB_CALLvsSERVICE), the parent (ambient → threaded → root), and whether the call is a traceable operation or a metrics-only ping. Guardrail span data is built from the typed, provider-agnosticStandardLoggingGuardrailInformation— no single provider's field shape is assumed. - Export: each span ends and is handed to the provider's span processors, which export to the configured backends (OTLP, console, in-memory, …).
Components
Sources of truth (model/, no OpenTelemetry import)
These define the shape of a span without depending on the OTel SDK, so they can
be imported anywhere. They live in model/ and form a closed set —
nothing here imports outside it:
semconv.py— attribute-key constants (gen_ai.*,http.*,litellm.*), the GenAI operation/provider enums, and the functions that map LiteLLM provider/call-type strings onto convention values.spans.py— the span registry: every span role, its OTel span kind, its place in the hierarchy, and its name builder.payloads.py— frozen dataclasses (LLMCallSpanData,GuardrailSpanData,ServiceSpanData, …) built from heterogeneous logging payloads viafrom_*classmethods.config.py—OpenTelemetryV2Config, a pydantic-settings model that readsOTEL_*/LITELLM_OTEL_*env vars, plus the feature gate.capture_span_contentgates whether prompt/response bodies may be written as span attributes; it defaults off (no_content). The Baggage allowlists are configurable, not hard-coded: setLITELLM_OTEL_BAGGAGE_PROMOTED_KEYS/LITELLM_OTEL_BAGGAGE_METADATA_KEYS/LITELLM_OTEL_BAGGAGE_TEAM_METADATA_KEYS(comma-separated) as env vars, orbaggage_promoted_keys/baggage_metadata_keys/baggage_team_metadata_keys(YAML lists) undercallback_settings.otelinconfig.yaml— the latter reach the config through the logger's constructor kwargs.baggage_team_metadata_keysis empty by default, so none of a team's free-form metadata is promoted until each sub-key is explicitly allowlisted.baggage.py— the single definition of which request-identity values are promoted into Baggage (so child spans inherit them) and under which attribute keys.utils.py— value coercion, JSON serialization, and extractor-table application, shared across the package.
Engine
emitter.py—SpanEmitter.emit(role, data): dedupe → start the span → run the mapper chain to stamp attributes → set status → end. It owns no attribute keys. The dedupe set (which coalesces the sync+async firing of one request) is a bounded LRU so it can't grow without limit.mappers/— each mapper turns typed span data into a flat{attribute key: value}dict. They compose: listing several mapper names in the config layers multiple attribute vocabularies onto the same span.genai— the canonical OpenTelemetry GenAI vocabulary, always present.legacy— an additional vocabulary using the older semconv-ai / Traceloop attribute key names, for backends that read those.openinference,langfuse,weave,langtrace— vendor vocabularies.resolve_mappers(names)turns config names into mapper instances.
Plumbing (plumbing/)
The OTel-SDK wiring. Everything here imports only model/ and each other; it
lives in plumbing/:
providers.py— builds theTracerProvider, its exporters (fromExporterSpecs), and the span processor that copies allowlisted Baggage entries onto every span.register_exporter_factory(kind, factory)lets a preset contribute a custom exporterkind(e.g. one that fetches an auth token lazily) without coupling this module to any vendor.context.py— trace-context and Baggage read/write helpers.routing.py—TenantTracerCache: when a request carries team/key-scoped vendor credentials, route its spans through a credential-keyedTracerProviderso one logger serves many tenants. The cache is a bounded LRU that flushes + shuts down evicted providers, since the key derives from request-supplied credentials and must not grow (or leak threads) without limit.metrics.py— GenAI client metric instruments. The sixgen_ai.client.*histograms are recorded through the meter resolved byproviders.resolve_meter_provider: an injected provider wins (tests/DI), otherwise the operator's globally configuredMeterProvideris reused so its readers/exporters receive them alongside the server metrics, and one is built and registered as the global only when none is set (mirroring how V2 owns trace export).
Adapter
logger.py—OpenTelemetryV2, aCustomLoggerthat translates LiteLLM's logging callbacks into typed span data and hands them to the engine. The LLM-call span is opened at thelog_pre_api_callboundary (parented to the live server span via ambient context) and closed at the async success/failure callback; the open span is held in a bounded cache keyed bylitellm_call_id, never threaded through a metadata dict. The logger registers itself intolitellm.input_callbacksoLogging.pre_callfires the boundary hook.mount.py—instrument_fastapi_app(app), the single call site that attachesopentelemetry-instrumentation-fastapifor SERVER spans. It owns the health-check exclusion default (OTEL_PYTHON_FASTAPI_EXCLUDED_URLS) and the passthrough span-naming hook (PASSTHROUGH_PREFIXES) soproxy_servercarries no OTel detail. A safe no-op when the gate is off or the instrumentation package is absent; must be called at app-creation time (the middleware stack freezes once the app serves).
Presets
presets/— each preset reads one integration's env vars and returns anOpenTelemetryV2Config(exporter destination + mapper vocabularies- resource attributes).
PRESET_BY_CALLBACKmaps a callback name ("arize","langfuse_otel", …) to its preset. Integrations that support team/key-scoped credentials also provide a per-request OTLP header builder (DYNAMIC_HEADERS_BY_CALLBACK). Presets do no network I/O at build time: AgentOps, for example, mints its JWT lazily inside a custom exporter on the first export (in theBatchSpanProcessorworker thread), never on the event loop.
- resource attributes).
Extending
- A new attribute vocabulary for a backend: add a mapper in
mappers/(a class with amap(data) -> AttributeMapmethod, typically built fromkey -> extractortables) and register it inmappers/__init__._MAPPER_BY_NAME. - A new integration: add a preset in
presets/that returns anOpenTelemetryV2Config, and register it inpresets/__init__.PRESET_BY_CALLBACK. If it supports dynamic credentials, add a header builder toDYNAMIC_HEADERS_BY_CALLBACK. - A new span kind: add a role to
spans.py(registry entry + name builder), a payload dataclass inpayloads.py, and a branch in the relevant mapper(s).