skills/agent-vision-awareness/USAGE.md

# Agent Vision Awareness - Usage Guide

## Quick Setup

1. **API Key 配置**:
   - 火山方舟 API Key 已在 OpenCode 配置中
   - 或设置 `VOLCENGINE_API_KEY` 环境变量

2. **Add skill to OMO configuration**:
   ```yaml
   skills:
     - agent-vision-awareness
   ```

3. **Use naturally** - just mention images in your requests:
   - "分析这个截图 error.png"
   - "描述 temp/image.jpg 的内容"
   - "根据架构图 design/architecture.png 生成部署方案"

## Integration Examples

### Basic Usage (Automatic)
```python
# No special code needed - automatic detection and processing
user_input = "帮我分析这个错误日志截图：./logs/error.png"
# The skill will automatically detect and process the image
```

### Manual Integration (When Needed)
```python
from .scripts.integrate_vision import process_user_input

# Process user input with visual content
result = process_user_input(
    user_input="分析图表 sales_chart.png",
    user_request="提取销售数据趋势",
    config={
        "api_key": os.environ.get("VOLCENGINE_API_KEY"),
        "base_url": "https://ark.cn-beijing.volces.com/api/coding/v3",
        "model": "doubao-seed-code"
    }
)

if result["confidence"] != "none":
    analysis = result["analysis_results"][0]["result"]
    # Use analysis in your response
    response = f"根据图片分析：{analysis}"
else:
    # Handle as normal text-only request
    response = "处理文本请求..."
```

## Configuration

Copy `config/settings.json.example` to `config/settings.json` and update with your API key:

```json
{
  "vision_api": {
    "key": "b0359bed-09f2-49e2-a53c-32ba057412e3",
    "base_url": "https://ark.cn-beijing.volces.com/api/coding/v3",
    "model": "doubao-seed-code"
  }
}
```

## Supported Features

✅ **Automatic Detection**: File extensions, keywords, URLs, markdown syntax
✅ **Multiple Analysis Modes**: OCR, chart analysis, product analysis, scene description
✅ **Error Handling**: Graceful degradation with clear error messages
✅ **File Support**: Local files, relative paths, absolute paths, URLs
✅ **Format Support**: PNG, JPG, JPEG, WebP, GIF, BMP

## Limitations

⚠️ **No Custom Agent Delegation**: The `@multimodal-looker` approach doesn't work with current OMO
⚠️ **API Key Required**: Must have valid 火山方舟 API key
⚠️ **File Size**: Images should be < 4MB for optimal performance
⚠️ **Network**: Requires internet access to `https://ark.cn-beijing.volces.com`

## Troubleshooting

**Vision processing not working?**
- Check `VOLCENGINE_API_KEY` configuration
- Verify image file exists and is accessible
- Test with simple request: "描述 image.png"

**Detection not triggering?**
- Ensure input contains detectable patterns (file extensions like `.png`, keywords like "图片")
- Use explicit file paths instead of vague references

**API errors?**
- Check network connectivity to 火山方舟 API
- Verify API key is valid
- Check rate limits on your account

## Related Skills

- `image-service`: For image generation and editing
- `file-reader`: For reading document contents (complementary)

This skill provides **fully automatic visual content processing** without requiring manual intervention or custom agent commands.