Files
skills/agent-vision-awareness/references/detection-patterns.md
T
hmo 04db423416 Initial commit: skills library
- 70 skills with code and documentation
- Add .gitignore (ignore __pycache__, output/, temp/, venv/)
- Clean up test intermediates and caches
2026-04-26 19:27:40 +08:00

196 lines
4.8 KiB
Markdown

# Visual Content Detection Patterns
Complete reference for detecting visual content in user inputs.
## File Extension Patterns
### Image Files (High Priority)
```
.png, .jpg, .jpeg, .gif, .bmp, .webp, .svg, .ico, .tiff, .tif, .heic, .heif, .raw, .psd, .ai, .eps
```
**Detection Rule**: Case-insensitive match anywhere in input
### Document Files with Visual Content (Medium Priority)
```
.pdf (may contain diagrams), .ppt, .pptx (slides with visuals), .vsdx (Visio), .drawio
```
**Detection Rule**: File extension + visual keywords
## Keyword Patterns
### Chinese Visual Keywords
```
一级关键词(高优先级):
图片,图像,照片,截图,图表,图示,图形,影像,画面
二级关键词(中优先级):
流程图,架构图,时序图,ER 图,思维导图,柱状图,饼图,折线图
设计图,原型图,线框图,界面,UI,UX
表格,表单,清单,列表
三级关键词(低优先级):
显示,展示,呈现,可视化,看图,读图
```
### English Visual Keywords
```
High Priority:
image, photo, picture, screenshot, snapshot, capture, diagram, chart, graph, plot, figure
Medium Priority:
flowchart, architecture, sequence diagram, ER diagram, mind map, bar chart, pie chart, line graph
design, mockup, wireframe, interface, UI, UX, layout
table, form, list, grid
Low Priority:
show, display, visualize, view, look at, see
```
### Technical Visual Keywords
```
Schema, model, blueprint, spec, technical drawing
Dashboard, widget, panel, visualization
Map, heatmap, scatter plot, histogram
Infographic, poster, banner, thumbnail
```
## Pattern Matching Rules
### Rule 1: File Path + Extension
```regex
[\w\-\.\/]+?\.(png|jpg|jpeg|gif|bmp|webp|svg|ico|tiff|heic)
```
**Action**: Immediate delegation to multimodal-looker
### Rule 2: Markdown Image Syntax
```regex
!\[([^\]]*)\]\(([^\)]+)\)
```
**Action**: Extract alt text and URL, delegate to multimodal-looker
### Rule 3: Base64 Image Data
```regex
data:image\/(png|jpeg|gif|webp);base64,[A-Za-z0-9+/=]+
```
**Action**: Extract base64 data, save to temp file, delegate
### Rule 4: Keyword + File Reference
```
(图片 | 图像|diagram|chart|screenshot).*?[\w\-\.\/]+\.(png|jpg|jpeg|gif|bmp|webp)
```
**Action**: Confirm intent, then delegate
### Rule 5: Keyword Only (Ambiguous)
```
(帮我看看这个图 | 分析这张图片 | 这个图表显示)
```
**Action**: Ask for clarification: "请问是哪张图片?"
## Context-Aware Detection
### Code Development Context
When user is working on code:
- `architecture.png` → Architecture diagram
- `screenshot.png` → Error or UI screenshot
- `mockup.jpg` → Design reference
**Action**: Assume technical visual, delegate with context
### Data Analysis Context
When user mentions data:
- `chart`, `graph`, `plot`, `visualization`
- `sales_chart.png`, `trend_graph.jpg`
**Action**: Assume data visualization, request data extraction
### Design Context
When user discusses design:
- `mockup`, `wireframe`, `prototype`, `design`
- `ui_design.png`, `wireframe.jpg`
**Action**: Assume design visual, request UI/UX analysis
## Detection Confidence Levels
| Level | Confidence | Triggers | Action |
|-------|------------|----------|--------|
| HIGH | 90-100% | Image file + visual keyword | Auto-delegate |
| MEDIUM | 60-89% | Image file OR strong keyword | Confirm then delegate |
| LOW | 30-59% | Weak keyword only | Ask for clarification |
| NONE | 0-29% | No visual signals | Process as text |
## Edge Cases
### Ambiguous References
```
"看这个" (without specifying what)
"这个文件" (could be text or image)
```
**Handling**: Ask "请问是哪个文件?是图片吗?"
### Multiple Images
```
"比较这两张图:img1.png 和 img2.png"
```
**Handling**: Delegate both, request comparison
### Image in Code Block
````
```
![image](path.png)
```
````
**Handling**: Still detect as visual content (user may be documenting)
### URL Images
```
https://example.com/image.png
http://cdn.site.com/chart.jpg
```
**Handling**: Detect as visual, may need download first
## Implementation Checklist
- [ ] Scan input for file extensions
- [ ] Check for markdown image syntax
- [ ] Search for visual keywords
- [ ] Evaluate context (code, data, design)
- [ ] Assign confidence level
- [ ] Execute appropriate action (delegate/confirm/ask)
## Testing Examples
### Should Trigger (High Confidence)
```
分析这个截图:error.png
看这张架构图 design/architecture.png
![流程图](flow.png) 显示什么?
帮我看看 data:image/png;base64,...
```
### Should Trigger (Medium Confidence)
```
这个图片怎么优化?screenshot.png
diagram.jpg 有什么改进建议
```
### Should Ask (Low Confidence)
```
帮我看看这个图 (no file specified)
这个设计怎么样?(unclear if visual attached)
```
### Should Not Trigger
```
帮我写代码
这个文本怎么格式化
纯文字内容
```