1b2b935832
- Platform-based architecture (Windows/Linux/Mac) - Agent instance registry (agents.yaml) - Management dashboard with cross-platform monitoring - xmpp_bot with HTTP bridge + health endpoints - wechat_agent with WeChat-Hermes bridging - Platform services: ProcessGuardian, HealthProbe, APIRouter, ChannelBridge - Deployment: systemd (Linux) + PowerShell (Windows) - Monitoring: SSH+ejabberdctl for cross-platform presence
129 lines
3.3 KiB
Markdown
129 lines
3.3 KiB
Markdown
# AgentsMeeting — 运维手册
|
||
|
||
> 版本: v2.0 | 日期: 2026-06-12
|
||
|
||
---
|
||
|
||
## 日常检查
|
||
|
||
### Dashboard
|
||
|
||
打开 `http://192.168.1.246:5803` 查看所有 Agent 和平台服务状态。
|
||
|
||
- 绿色 = 在线
|
||
- 黄色 = degraded(进程活着但 XMPP 不稳)
|
||
- 红色 = 离线
|
||
- 灰色 = 未知(远程 Agent,无法检测)
|
||
|
||
展开 Agent 卡片可查看实时日志。
|
||
|
||
### 命令行检查
|
||
|
||
```powershell
|
||
# Windows 快速状态
|
||
powershell -File deploy\windows\check.ps1
|
||
```
|
||
|
||
```bash
|
||
# Linux 所有 systemd 服务
|
||
systemctl status agentsmeeting-dashboard hermes-gateway@{profile} xmpp-bot-{name}
|
||
```
|
||
|
||
---
|
||
|
||
## 监控架构
|
||
|
||
```
|
||
Dashboard (:5803, Linux)
|
||
│
|
||
├── Docker exec ejabberdctl → 在线 JID 列表(跨平台权威)
|
||
├── GET 192.168.1.16:5802/health → xmpp_bot XMPP 连接状态
|
||
├── GET 192.168.1.16:5801/health → wechat_agent hermes 连接状态
|
||
└── TCP connect 192.168.1.16:8787 → api_proxy 端口可达性
|
||
```
|
||
|
||
---
|
||
|
||
## systemd 服务(Linux)
|
||
|
||
| 服务 | 命令 |
|
||
|------|------|
|
||
| agentsmeeting-dashboard | `systemctl status/restart agentsmeeting-dashboard` |
|
||
| hermes-gateway@main | `systemctl status hermes-gateway@main` |
|
||
| hermes-gateway@zhiwei | `systemctl status hermes-gateway@zhiwei` |
|
||
| hermes-gateway@xiaoguo | `systemctl status hermes-gateway@xiaoguo` |
|
||
| xmpp-bot-mohe | `systemctl status xmpp-bot-mohe` |
|
||
| xmpp-bot-zhiwei | `systemctl status xmpp-bot-zhiwei` |
|
||
|
||
---
|
||
|
||
## 健康端点
|
||
|
||
| 服务 | URL | 含义 |
|
||
|------|-----|------|
|
||
| xmpp_bot | `GET :5802/health` | `xmpp_connected` = XMPP 是否在线 |
|
||
| wechat_agent | `GET :5801/health` | `hermes_connected` = 到莫荷 gateway 是否通 |
|
||
| Dashboard | `GET :5803/api/health` | Dashboard 自身是否正常 |
|
||
| Dashboard | `GET :5803/api/ejabberd` | ejabberd 在线用户列表 |
|
||
| Dashboard | `GET :5803/api/platform` | 平台服务状态 |
|
||
|
||
---
|
||
|
||
## 日志位置
|
||
|
||
| 日志 | Windows 路径 | 用途 |
|
||
|------|-------------|------|
|
||
| xmpp_bot.log | `gateway\logs\` | bot 连接/消息/HTTP 桥 |
|
||
| bridge.log | `gateway\logs\` | LLM API 调用 |
|
||
| watchdog.log | `gateway\logs\` | 看门狗启停 |
|
||
| health_check.log | `gateway\logs\` | 5 分钟健康检查 |
|
||
| dashboard.log | `gateway\logs\` | Dashboard 运行日志 |
|
||
| mohe_inbox.log | `gateway\logs\` | 莫荷消息记录 |
|
||
|
||
Linux Dashboard 日志:`sudo journalctl -u agentsmeeting-dashboard -f`
|
||
|
||
---
|
||
|
||
## 常见故障
|
||
|
||
### Bot 频繁断连
|
||
|
||
**症状**: 日志每 ~50 秒出现 `disconnected, reconnecting...`
|
||
|
||
**根因**: ejabberd `mod_ping: timeout_action: kill` 在 frp 隧道延迟下超时
|
||
|
||
**已修复**: `timeout_action: none`
|
||
|
||
### MUC 加群失败
|
||
|
||
**症状**: `MUC join timeout (1/3) ... MUC setup failed`
|
||
|
||
**根因**: ejabberd TLS 证书未覆盖 `conference.yoin.fun`
|
||
|
||
**已修复**: 生成自签证书 `conference.pem` 并加入 certfiles;用 SSH+ejabberdctl 绕过跨平台监控
|
||
|
||
### API Key 额度超限
|
||
|
||
**症状**: bridge.log 显示 `HTTP 429`,bot 不回复
|
||
|
||
**处理**: 等待配额重置(火山每月 15 日 00:00 CST),或切换 provider
|
||
|
||
### 两个 bot 同时跑
|
||
|
||
**症状**: 消息重复回应
|
||
|
||
**根因**: watchdog 没杀旧进程就启动新进程
|
||
|
||
**已修复**: watchdog `start_bot()` 先 kill 旧进程 + `proc_guard` PID 锁
|
||
|
||
---
|
||
|
||
## 数据管理
|
||
|
||
```bash
|
||
# Linux 端 — Hermes session 归档
|
||
cd ~/.hermes/profiles/main/
|
||
cp state.db state.db.$(date +%Y%m%d)
|
||
hermes session prune --older-than 30d
|
||
```
|