hmo/MoFin

Files

T

知微 fa45d8aa5f fix: 小果地址统一node122(兼容LAN+EasyTier)

- health_checklist.json: 192.168.1.122→node122
- ocr_client.py: docstring IP→node122
- docs/market-data-requirements.md: IP→node122
- 所有API调用通过ProxyHandler({})绕过系统代理
  Privoxy对node122:18003返回500，直连正常

2026-06-30 02:56:35 +08:00

__init__.py

fix: 小果地址统一node122(兼容LAN+EasyTier)

2026-06-30 02:56:35 +08:00

adaptive_router.py

fix: 小果地址统一node122(兼容LAN+EasyTier)

2026-06-30 02:56:35 +08:00

bandit.py

fix: 小果地址统一node122(兼容LAN+EasyTier)

2026-06-30 02:56:35 +08:00

classifier.py

fix: 小果地址统一node122(兼容LAN+EasyTier)

2026-06-30 02:56:35 +08:00

config.py

fix: 小果地址统一node122(兼容LAN+EasyTier)

2026-06-30 02:56:35 +08:00

hooks.py

fix: 小果地址统一node122(兼容LAN+EasyTier)

2026-06-30 02:56:35 +08:00

README.md

fix: 小果地址统一node122(兼容LAN+EasyTier)

2026-06-30 02:56:35 +08:00

signals.py

fix: 小果地址统一node122(兼容LAN+EasyTier)

2026-06-30 02:56:35 +08:00

update_queue.py

fix: 小果地址统一node122(兼容LAN+EasyTier)

2026-06-30 02:56:35 +08:00

README.md

Adaptive Router (v0)

A request-type-aware routing strategy. For each incoming request, classify the prompt into one of seven RequestType buckets (code generation, writing, analytical reasoning, …), then Thompson-sample a Beta(α, β) bandit posterior per (request_type, model) cell to pick the best model. Quality estimates are combined with a normalized cost score via a weighted linear sum.

A post-call hook reads the response and runs lightweight regex + tool-call detectors (see signals.py) to award per-turn credit/blame to the model that served the turn. Updates are batched in-memory and flushed to Postgres every ~10s by a background task in proxy_server.py.

Config example

model_list:
  - model_name: gpt-4o
    litellm_params:
      model: openai/gpt-4o
    model_info:
      input_cost_per_token: 0.0000025
      adaptive_router_preferences:
        quality_tier: 3
        strengths: ["code_generation", "analytical_reasoning"]

  - model_name: gpt-4o-mini
    litellm_params:
      model: openai/gpt-4o-mini
    model_info:
      input_cost_per_token: 0.00000015
      adaptive_router_preferences:
        quality_tier: 2
        strengths: ["general", "factual_lookup"]

  - model_name: smart-router
    litellm_params:
      model: auto_router/adaptive_router
      adaptive_router_default_model: gpt-4o-mini
      adaptive_router_config:
        available_models: ["gpt-4o", "gpt-4o-mini"]
        weights:
          quality: 0.7
          cost: 0.3

Callers may pass header x-litellm-min-quality-tier: 3 (or metadata key min_quality_tier: 3) to force selection from tier-3-or-higher models only.

Behavior summary

Cold start. Each (request_type, model) cell starts with a Beta prior whose mean = BASE_TIER_WEIGHT[tier] (+ STRENGTH_BONUS if declared) and total mass = COLD_START_MASS (10). About ten real observations move it meaningfully.
Per-request decision. Sample once per eligible model, score with quality_weight·sample + cost_weight·normalized_cost, pick the argmax. Routing is stateless per-turn — no sticky lookup. Each call resamples.
Owner-cache attribution. Post-call, the conversation's first picked model claims an "owner slot" for OWNER_CACHE_TTL_SECONDS (24h). Later turns of the same conversation only fire bandit/state updates if the same model handled them — mismatches are dropped (no attribution) and counted in skipped_updates_total. Conversation identity is the client-supplied litellm_session_id if present, otherwise a sha256 over caller identity (api key hash, team, user, end-user) + the first message.
Per-turn updates. satisfaction → +α. misalignment, stagnation, disengagement, failure → +β (each). loop → +0.5β. exhaustion → 0 (uptime, not quality). Skipped if conversation has fewer than SIGNAL_GATE_MIN_MESSAGES messages.
Persistence. Bandit cells: aggregated deltas, eventually consistent. Session rows: last-write-wins snapshots.

Known v0 limitations

Latency is not in the score. Quality + cost only. A pathologically slow model can still be picked.
Hard sample cap at 200. Once α + β > 200, deltas are silently dropped. No rescaling — drift is a v1 concern.
24h owner-cache TTL. No explicit eviction below TTL. The in-memory map can grow if traffic patterns produce many one-shot sessions.
Owner-recovery skew. If model A "owns" a conversation but is then dethroned in the bandit, later turns served by model B are dropped — so bandit updates for that conversation flatline until A's TTL expires. Tracked via skipped_updates_total.
Signals are regex + tool-call only. No LLM-judge, no embedding similarity, no exemplar storage. Signals are best-effort and biased toward English.
One AdaptiveRouter per Router. Multiple adaptive_router/* deployments on the same litellm.Router raise at init.
Bandit-delta mapping is unvalidated. _compute_bandit_delta is a v0 guess; expect to retune after the first ~1000 sessions of real traffic.
request_type is classified per turn from the latest user message. For non-GENERAL turns, the current-turn type is used for bandit attribution (so genuine mid-session topic shifts update the correct cell). For GENERAL turns ("thanks!", "ok", "sounds good"), attribution falls back to the session's original type to avoid misattributing closing pleasantries.

README.md Unescape Escape

Adaptive Router (v0)

Config example

Behavior summary

Known v0 limitations

README.md