hmo/skills

Files

T

hmo 04db423416 Initial commit: skills library

- 70 skills with code and documentation
- Add .gitignore (ignore __pycache__, output/, temp/, venv/)
- Clean up test intermediates and caches

2026-04-26 19:27:40 +08:00

3.2 KiB

Raw Blame History

Agent Vision Awareness - Usage Guide

Quick Setup

API Key 配置:
- 火山方舟 API Key 已在 OpenCode 配置中
- 或设置 VOLCENGINE_API_KEY 环境变量
Add skill to OMO configuration:
```
skills:
  - agent-vision-awareness
```
Use naturally - just mention images in your requests:
- "分析这个截图 error.png"
- "描述 temp/image.jpg 的内容"
- "根据架构图 design/architecture.png 生成部署方案"

Integration Examples

Basic Usage (Automatic)

# No special code needed - automatic detection and processing
user_input = "帮我分析这个错误日志截图：./logs/error.png"
# The skill will automatically detect and process the image

Manual Integration (When Needed)

from .scripts.integrate_vision import process_user_input

# Process user input with visual content
result = process_user_input(
    user_input="分析图表 sales_chart.png",
    user_request="提取销售数据趋势",
    config={
        "api_key": os.environ.get("VOLCENGINE_API_KEY"),
        "base_url": "https://ark.cn-beijing.volces.com/api/coding/v3",
        "model": "doubao-seed-code"
    }
)

if result["confidence"] != "none":
    analysis = result["analysis_results"][0]["result"]
    # Use analysis in your response
    response = f"根据图片分析：{analysis}"
else:
    # Handle as normal text-only request
    response = "处理文本请求..."

Configuration

Copy config/settings.json.example to config/settings.json and update with your API key:

{
  "vision_api": {
    "key": "b0359bed-09f2-49e2-a53c-32ba057412e3",
    "base_url": "https://ark.cn-beijing.volces.com/api/coding/v3",
    "model": "doubao-seed-code"
  }
}

Supported Features

✅ Automatic Detection: File extensions, keywords, URLs, markdown syntax
✅ Multiple Analysis Modes: OCR, chart analysis, product analysis, scene description
✅ Error Handling: Graceful degradation with clear error messages
✅ File Support: Local files, relative paths, absolute paths, URLs
✅ Format Support: PNG, JPG, JPEG, WebP, GIF, BMP

Limitations

⚠️ No Custom Agent Delegation: The @multimodal-looker approach doesn't work with current OMO
⚠️ API Key Required: Must have valid 火山方舟 API key
⚠️ File Size: Images should be < 4MB for optimal performance
⚠️ Network: Requires internet access to https://ark.cn-beijing.volces.com

Troubleshooting

Vision processing not working?

Check VOLCENGINE_API_KEY configuration
Verify image file exists and is accessible
Test with simple request: "描述 image.png"

Detection not triggering?

Ensure input contains detectable patterns (file extensions like .png, keywords like "图片")
Use explicit file paths instead of vague references

API errors?

Check network connectivity to 火山方舟 API
Verify API key is valid
Check rate limits on your account

image-service: For image generation and editing
file-reader: For reading document contents (complementary)

This skill provides fully automatic visual content processing without requiring manual intervention or custom agent commands.

3.2 KiB Raw Blame History