Initial: multi-agent XMPP communication system with dashboard

- Platform-based architecture (Windows/Linux/Mac)
- Agent instance registry (agents.yaml)
- Management dashboard with cross-platform monitoring
- xmpp_bot with HTTP bridge + health endpoints
- wechat_agent with WeChat-Hermes bridging
- Platform services: ProcessGuardian, HealthProbe, APIRouter, ChannelBridge
- Deployment: systemd (Linux) + PowerShell (Windows)
- Monitoring: SSH+ejabberdctl for cross-platform presence
This commit is contained in:
hmo
2026-06-12 21:49:05 +08:00
commit 1b2b935832
76 changed files with 15943 additions and 0 deletions
+128
View File
@@ -0,0 +1,128 @@
# AgentsMeeting — 运维手册
> 版本: v2.0 | 日期: 2026-06-12
---
## 日常检查
### Dashboard
打开 `http://192.168.1.246:5803` 查看所有 Agent 和平台服务状态。
- 绿色 = 在线
- 黄色 = degraded(进程活着但 XMPP 不稳)
- 红色 = 离线
- 灰色 = 未知(远程 Agent,无法检测)
展开 Agent 卡片可查看实时日志。
### 命令行检查
```powershell
# Windows 快速状态
powershell -File deploy\windows\check.ps1
```
```bash
# Linux 所有 systemd 服务
systemctl status agentsmeeting-dashboard hermes-gateway@{profile} xmpp-bot-{name}
```
---
## 监控架构
```
Dashboard (:5803, Linux)
├── Docker exec ejabberdctl → 在线 JID 列表(跨平台权威)
├── GET 192.168.1.16:5802/health → xmpp_bot XMPP 连接状态
├── GET 192.168.1.16:5801/health → wechat_agent hermes 连接状态
└── TCP connect 192.168.1.16:8787 → api_proxy 端口可达性
```
---
## systemd 服务(Linux
| 服务 | 命令 |
|------|------|
| agentsmeeting-dashboard | `systemctl status/restart agentsmeeting-dashboard` |
| hermes-gateway@main | `systemctl status hermes-gateway@main` |
| hermes-gateway@zhiwei | `systemctl status hermes-gateway@zhiwei` |
| hermes-gateway@xiaoguo | `systemctl status hermes-gateway@xiaoguo` |
| xmpp-bot-mohe | `systemctl status xmpp-bot-mohe` |
| xmpp-bot-zhiwei | `systemctl status xmpp-bot-zhiwei` |
---
## 健康端点
| 服务 | URL | 含义 |
|------|-----|------|
| xmpp_bot | `GET :5802/health` | `xmpp_connected` = XMPP 是否在线 |
| wechat_agent | `GET :5801/health` | `hermes_connected` = 到莫荷 gateway 是否通 |
| Dashboard | `GET :5803/api/health` | Dashboard 自身是否正常 |
| Dashboard | `GET :5803/api/ejabberd` | ejabberd 在线用户列表 |
| Dashboard | `GET :5803/api/platform` | 平台服务状态 |
---
## 日志位置
| 日志 | Windows 路径 | 用途 |
|------|-------------|------|
| xmpp_bot.log | `gateway\logs\` | bot 连接/消息/HTTP 桥 |
| bridge.log | `gateway\logs\` | LLM API 调用 |
| watchdog.log | `gateway\logs\` | 看门狗启停 |
| health_check.log | `gateway\logs\` | 5 分钟健康检查 |
| dashboard.log | `gateway\logs\` | Dashboard 运行日志 |
| mohe_inbox.log | `gateway\logs\` | 莫荷消息记录 |
Linux Dashboard 日志:`sudo journalctl -u agentsmeeting-dashboard -f`
---
## 常见故障
### Bot 频繁断连
**症状**: 日志每 ~50 秒出现 `disconnected, reconnecting...`
**根因**: ejabberd `mod_ping: timeout_action: kill` 在 frp 隧道延迟下超时
**已修复**: `timeout_action: none`
### MUC 加群失败
**症状**: `MUC join timeout (1/3) ... MUC setup failed`
**根因**: ejabberd TLS 证书未覆盖 `conference.yoin.fun`
**已修复**: 生成自签证书 `conference.pem` 并加入 certfiles;用 SSH+ejabberdctl 绕过跨平台监控
### API Key 额度超限
**症状**: bridge.log 显示 `HTTP 429`bot 不回复
**处理**: 等待配额重置(火山每月 15 日 00:00 CST),或切换 provider
### 两个 bot 同时跑
**症状**: 消息重复回应
**根因**: watchdog 没杀旧进程就启动新进程
**已修复**: watchdog `start_bot()` 先 kill 旧进程 + `proc_guard` PID 锁
---
## 数据管理
```bash
# Linux 端 — Hermes session 归档
cd ~/.hermes/profiles/main/
cp state.db state.db.$(date +%Y%m%d)
hermes session prune --older-than 30d
```