Files
skills/agent-vision-awareness/USAGE.md
T
hmo 04db423416 Initial commit: skills library
- 70 skills with code and documentation
- Add .gitignore (ignore __pycache__, output/, temp/, venv/)
- Clean up test intermediates and caches
2026-04-26 19:27:40 +08:00

3.2 KiB

Agent Vision Awareness - Usage Guide

Quick Setup

  1. API Key 配置:

    • 火山方舟 API Key 已在 OpenCode 配置中
    • 或设置 VOLCENGINE_API_KEY 环境变量
  2. Add skill to OMO configuration:

    skills:
      - agent-vision-awareness
    
  3. Use naturally - just mention images in your requests:

    • "分析这个截图 error.png"
    • "描述 temp/image.jpg 的内容"
    • "根据架构图 design/architecture.png 生成部署方案"

Integration Examples

Basic Usage (Automatic)

# No special code needed - automatic detection and processing
user_input = "帮我分析这个错误日志截图:./logs/error.png"
# The skill will automatically detect and process the image

Manual Integration (When Needed)

from .scripts.integrate_vision import process_user_input

# Process user input with visual content
result = process_user_input(
    user_input="分析图表 sales_chart.png",
    user_request="提取销售数据趋势",
    config={
        "api_key": os.environ.get("VOLCENGINE_API_KEY"),
        "base_url": "https://ark.cn-beijing.volces.com/api/coding/v3",
        "model": "doubao-seed-code"
    }
)

if result["confidence"] != "none":
    analysis = result["analysis_results"][0]["result"]
    # Use analysis in your response
    response = f"根据图片分析:{analysis}"
else:
    # Handle as normal text-only request
    response = "处理文本请求..."

Configuration

Copy config/settings.json.example to config/settings.json and update with your API key:

{
  "vision_api": {
    "key": "b0359bed-09f2-49e2-a53c-32ba057412e3",
    "base_url": "https://ark.cn-beijing.volces.com/api/coding/v3",
    "model": "doubao-seed-code"
  }
}

Supported Features

Automatic Detection: File extensions, keywords, URLs, markdown syntax
Multiple Analysis Modes: OCR, chart analysis, product analysis, scene description
Error Handling: Graceful degradation with clear error messages
File Support: Local files, relative paths, absolute paths, URLs
Format Support: PNG, JPG, JPEG, WebP, GIF, BMP

Limitations

⚠️ No Custom Agent Delegation: The @multimodal-looker approach doesn't work with current OMO
⚠️ API Key Required: Must have valid 火山方舟 API key
⚠️ File Size: Images should be < 4MB for optimal performance
⚠️ Network: Requires internet access to https://ark.cn-beijing.volces.com

Troubleshooting

Vision processing not working?

  • Check VOLCENGINE_API_KEY configuration
  • Verify image file exists and is accessible
  • Test with simple request: "描述 image.png"

Detection not triggering?

  • Ensure input contains detectable patterns (file extensions like .png, keywords like "图片")
  • Use explicit file paths instead of vague references

API errors?

  • Check network connectivity to 火山方舟 API
  • Verify API key is valid
  • Check rate limits on your account
  • image-service: For image generation and editing
  • file-reader: For reading document contents (complementary)

This skill provides fully automatic visual content processing without requiring manual intervention or custom agent commands.