- 70 skills with code and documentation - Add .gitignore (ignore __pycache__, output/, temp/, venv/) - Clean up test intermediates and caches
5.0 KiB
name, description
| name | description |
|---|---|
| agent-vision-awareness | Enables automatic visual content detection and processing for OMO agents. Uses direct API calls to vision models instead of custom agent delegation. Triggers on image files, diagrams, charts, screenshots, or any visual media references. |
Agent Vision Awareness
Overview
This skill provides automatic visual content detection for OMO agents working with text-only models.
Key Change: Instead of problematic custom agent delegation, this skill uses direct API integration with 火山方舟 (VolcEngine) vision models, which works reliably with current OMO versions.
Core Capabilities
- ✅ Automatic detection of images, diagrams, charts, screenshots in user input
- ✅ Direct API integration with 火山方舟 Vision API (doubao-1.5-vision-pro)
- ✅ Context-aware analysis with appropriate modes (OCR, chart analysis, etc.)
- ✅ Graceful degradation when vision processing fails
- ✅ No custom agent dependency - works with standard OMO configuration
Detection Logic
Trigger Patterns
Detects visual content when user input contains:
1. Image File Extensions:
.png,.jpg,.jpeg,.gif,.bmp,.webp- Case-insensitive matching
2. Visual Content Keywords:
- Chinese: "图片", "图像", "照片", "截图", "图表", "图示"
- English: "diagram", "chart", "graph", "screenshot", "image", "photo"
3. File Path Patterns:
- Absolute paths:
C:/path/to/image.png - Relative paths:
./assets/diagram.png - URLs:
https://example.com/image.png
Integration Workflow
Step 1: Detect Visual Content
When receiving user input, scan for visual content signals using the detection logic above.
Step 2: Direct API Processing
When visual content is detected, make direct API calls to VolcEngine:
- Uses
volcengineAPI Key from OpenCode config - Supports all common image formats
- Handles local files and URLs
Step 3: Result Integration
- Seamlessly integrates visual analysis results into responses
- Maintains conversation context
- Provides natural language descriptions
Usage Examples
Example 1: User requests image analysis
User Input: "描述 temp/稿定设计-1.png 这张图片的内容"
Agent Response: Automatically detects the PNG file, processes it via API, and returns the detailed description (as demonstrated in our testing).
Example 2: User mentions screenshot
User Input: "帮我分析这个错误截图 error.png"
Agent Response: Detects "截图" keyword + .png extension, processes the image, and provides error analysis.
Example 3: No visual content
User Input: "写一个 Python 脚本"
Agent Response: No detection triggered, processes as normal text-only request.
Configuration
Required Setup
- Vision Model:
doubao-seed-2.0-pro(火山方舟直接调用) - API Endpoint:
https://ark.cn-beijing.volces.com/api/coding/v3 - API Key: Uses existing volcengine API Key from OpenCode config
Known Limitations
- 响应时间较长 (20-60秒)
- 不够稳定,偶尔超时
- 推荐: 压缩图片到1024px可提升响应速度
Loading the Skill
Add to OMO configuration:
skills:
- agent-vision-awareness
Script Integration
The skill includes executable scripts in scripts/ directory:
vision_processor.py- Main vision processing script- Handles both detection and API integration
- Can be used standalone or integrated into agent workflows
API Integration Details
The skill uses 火山方舟 (VolcEngine) Ark API for vision understanding:
from openai import OpenAI
client = OpenAI(
base_url='https://ark.cn-beijing.volces.com/api/v3',
api_key='YOUR_API_KEY' # 从config.json的volcengine配置获取
)
# Vision model name
model = 'doubao-seed-2.0-pro'
# 支持的图片格式: base64, URL
Graceful Degradation
If vision processing fails:
- Provides clear error messages
- Suggests alternatives (describe content in text)
- Continues with text-only processing when possible
Best Practices
Do's
- Trust automatic detection - the system will handle visual content seamlessly
- Provide clear context - mention what you want analyzed from the image
- Use natural language - just ask normally, no special commands needed
Don'ts
- Don't specify agents - no need for
@multimodal-lookercommands - Don't worry about file paths - the system handles relative/absolute paths
- Don't repeat requests - automatic processing happens on first mention
Troubleshooting
Issue: Vision processing not working
- Check: Ensure volcengine API Key is valid in OpenCode config
- Check: Verify image file exists and is accessible
- Fix: Test with simple image description request
Issue: Detection not triggering
- Check: Ensure input contains detectable patterns (file extensions, keywords)
- Fix: Use explicit file paths or visual keywords
This skill enables fully automatic visual content processing without requiring manual intervention or custom agent commands.