Files
AgentsMeeting/docs/OPS.md
T
hmo 1b2b935832 Initial: multi-agent XMPP communication system with dashboard
- Platform-based architecture (Windows/Linux/Mac)
- Agent instance registry (agents.yaml)
- Management dashboard with cross-platform monitoring
- xmpp_bot with HTTP bridge + health endpoints
- wechat_agent with WeChat-Hermes bridging
- Platform services: ProcessGuardian, HealthProbe, APIRouter, ChannelBridge
- Deployment: systemd (Linux) + PowerShell (Windows)
- Monitoring: SSH+ejabberdctl for cross-platform presence
2026-06-12 21:51:36 +08:00

129 lines
3.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# AgentsMeeting — 运维手册
> 版本: v2.0 | 日期: 2026-06-12
---
## 日常检查
### Dashboard
打开 `http://192.168.1.246:5803` 查看所有 Agent 和平台服务状态。
- 绿色 = 在线
- 黄色 = degraded(进程活着但 XMPP 不稳)
- 红色 = 离线
- 灰色 = 未知(远程 Agent,无法检测)
展开 Agent 卡片可查看实时日志。
### 命令行检查
```powershell
# Windows 快速状态
powershell -File deploy\windows\check.ps1
```
```bash
# Linux 所有 systemd 服务
systemctl status agentsmeeting-dashboard hermes-gateway@{profile} xmpp-bot-{name}
```
---
## 监控架构
```
Dashboard (:5803, Linux)
├── Docker exec ejabberdctl → 在线 JID 列表(跨平台权威)
├── GET 192.168.1.16:5802/health → xmpp_bot XMPP 连接状态
├── GET 192.168.1.16:5801/health → wechat_agent hermes 连接状态
└── TCP connect 192.168.1.16:8787 → api_proxy 端口可达性
```
---
## systemd 服务(Linux
| 服务 | 命令 |
|------|------|
| agentsmeeting-dashboard | `systemctl status/restart agentsmeeting-dashboard` |
| hermes-gateway@main | `systemctl status hermes-gateway@main` |
| hermes-gateway@zhiwei | `systemctl status hermes-gateway@zhiwei` |
| hermes-gateway@xiaoguo | `systemctl status hermes-gateway@xiaoguo` |
| xmpp-bot-mohe | `systemctl status xmpp-bot-mohe` |
| xmpp-bot-zhiwei | `systemctl status xmpp-bot-zhiwei` |
---
## 健康端点
| 服务 | URL | 含义 |
|------|-----|------|
| xmpp_bot | `GET :5802/health` | `xmpp_connected` = XMPP 是否在线 |
| wechat_agent | `GET :5801/health` | `hermes_connected` = 到莫荷 gateway 是否通 |
| Dashboard | `GET :5803/api/health` | Dashboard 自身是否正常 |
| Dashboard | `GET :5803/api/ejabberd` | ejabberd 在线用户列表 |
| Dashboard | `GET :5803/api/platform` | 平台服务状态 |
---
## 日志位置
| 日志 | Windows 路径 | 用途 |
|------|-------------|------|
| xmpp_bot.log | `gateway\logs\` | bot 连接/消息/HTTP 桥 |
| bridge.log | `gateway\logs\` | LLM API 调用 |
| watchdog.log | `gateway\logs\` | 看门狗启停 |
| health_check.log | `gateway\logs\` | 5 分钟健康检查 |
| dashboard.log | `gateway\logs\` | Dashboard 运行日志 |
| mohe_inbox.log | `gateway\logs\` | 莫荷消息记录 |
Linux Dashboard 日志:`sudo journalctl -u agentsmeeting-dashboard -f`
---
## 常见故障
### Bot 频繁断连
**症状**: 日志每 ~50 秒出现 `disconnected, reconnecting...`
**根因**: ejabberd `mod_ping: timeout_action: kill` 在 frp 隧道延迟下超时
**已修复**: `timeout_action: none`
### MUC 加群失败
**症状**: `MUC join timeout (1/3) ... MUC setup failed`
**根因**: ejabberd TLS 证书未覆盖 `conference.yoin.fun`
**已修复**: 生成自签证书 `conference.pem` 并加入 certfiles;用 SSH+ejabberdctl 绕过跨平台监控
### API Key 额度超限
**症状**: bridge.log 显示 `HTTP 429`bot 不回复
**处理**: 等待配额重置(火山每月 15 日 00:00 CST),或切换 provider
### 两个 bot 同时跑
**症状**: 消息重复回应
**根因**: watchdog 没杀旧进程就启动新进程
**已修复**: watchdog `start_bot()` 先 kill 旧进程 + `proc_guard` PID 锁
---
## 数据管理
```bash
# Linux 端 — Hermes session 归档
cd ~/.hermes/profiles/main/
cp state.db state.db.$(date +%Y%m%d)
hermes session prune --older-than 30d
```