skills/agent-vision-awareness/references/detection-patterns.md

# Visual Content Detection Patterns

Complete reference for detecting visual content in user inputs.

## File Extension Patterns

### Image Files (High Priority)
```
.png, .jpg, .jpeg, .gif, .bmp, .webp, .svg, .ico, .tiff, .tif, .heic, .heif, .raw, .psd, .ai, .eps
```

**Detection Rule**: Case-insensitive match anywhere in input

### Document Files with Visual Content (Medium Priority)
```
.pdf (may contain diagrams), .ppt, .pptx (slides with visuals), .vsdx (Visio), .drawio
```

**Detection Rule**: File extension + visual keywords

## Keyword Patterns

### Chinese Visual Keywords
```
一级关键词（高优先级）:
图片，图像，照片，截图，图表，图示，图形，影像，画面

二级关键词（中优先级）:
流程图，架构图，时序图，ER 图，思维导图，柱状图，饼图，折线图
设计图，原型图，线框图，界面，UI，UX
表格，表单，清单，列表

三级关键词（低优先级）:
显示，展示，呈现，可视化，看图，读图
```

### English Visual Keywords
```
High Priority:
image, photo, picture, screenshot, snapshot, capture, diagram, chart, graph, plot, figure

Medium Priority:
flowchart, architecture, sequence diagram, ER diagram, mind map, bar chart, pie chart, line graph
design, mockup, wireframe, interface, UI, UX, layout
table, form, list, grid

Low Priority:
show, display, visualize, view, look at, see
```

### Technical Visual Keywords
```
Schema, model, blueprint, spec, technical drawing
Dashboard, widget, panel, visualization
Map, heatmap, scatter plot, histogram
Infographic, poster, banner, thumbnail
```

## Pattern Matching Rules

### Rule 1: File Path + Extension
```regex
[\w\-\.\/]+?\.(png|jpg|jpeg|gif|bmp|webp|svg|ico|tiff|heic)
```

**Action**: Immediate delegation to multimodal-looker

### Rule 2: Markdown Image Syntax
```regex
!\[([^\]]*)\]\(([^\)]+)\)
```

**Action**: Extract alt text and URL, delegate to multimodal-looker

### Rule 3: Base64 Image Data
```regex
data:image\/(png|jpeg|gif|webp);base64,[A-Za-z0-9+/=]+
```

**Action**: Extract base64 data, save to temp file, delegate

### Rule 4: Keyword + File Reference
```
(图片 | 图像|diagram|chart|screenshot).*?[\w\-\.\/]+\.(png|jpg|jpeg|gif|bmp|webp)
```

**Action**: Confirm intent, then delegate

### Rule 5: Keyword Only (Ambiguous)
```
(帮我看看这个图 | 分析这张图片 | 这个图表显示)
```

**Action**: Ask for clarification: "请问是哪张图片？"

## Context-Aware Detection

### Code Development Context
When user is working on code:
- `architecture.png` → Architecture diagram
- `screenshot.png` → Error or UI screenshot
- `mockup.jpg` → Design reference

**Action**: Assume technical visual, delegate with context

### Data Analysis Context
When user mentions data:
- `chart`, `graph`, `plot`, `visualization`
- `sales_chart.png`, `trend_graph.jpg`

**Action**: Assume data visualization, request data extraction

### Design Context
When user discusses design:
- `mockup`, `wireframe`, `prototype`, `design`
- `ui_design.png`, `wireframe.jpg`

**Action**: Assume design visual, request UI/UX analysis

## Detection Confidence Levels

| Level | Confidence | Triggers | Action |
|-------|------------|----------|--------|
| HIGH | 90-100% | Image file + visual keyword | Auto-delegate |
| MEDIUM | 60-89% | Image file OR strong keyword | Confirm then delegate |
| LOW | 30-59% | Weak keyword only | Ask for clarification |
| NONE | 0-29% | No visual signals | Process as text |

## Edge Cases

### Ambiguous References
```
"看这个" (without specifying what)
"这个文件" (could be text or image)
```
**Handling**: Ask "请问是哪个文件？是图片吗？"

### Multiple Images
```
"比较这两张图：img1.png 和 img2.png"
```
**Handling**: Delegate both, request comparison

### Image in Code Block
````
```
![image](path.png)
```
````
**Handling**: Still detect as visual content (user may be documenting)

### URL Images
```
https://example.com/image.png
http://cdn.site.com/chart.jpg
```
**Handling**: Detect as visual, may need download first

## Implementation Checklist

- [ ] Scan input for file extensions
- [ ] Check for markdown image syntax
- [ ] Search for visual keywords
- [ ] Evaluate context (code, data, design)
- [ ] Assign confidence level
- [ ] Execute appropriate action (delegate/confirm/ask)

## Testing Examples

### Should Trigger (High Confidence)
```
分析这个截图：error.png
看这张架构图 design/architecture.png
![流程图](flow.png) 显示什么？
帮我看看 data:image/png;base64,...
```

### Should Trigger (Medium Confidence)
```
这个图片怎么优化？screenshot.png
diagram.jpg 有什么改进建议
```

### Should Ask (Low Confidence)
```
帮我看看这个图 (no file specified)
这个设计怎么样？(unclear if visual attached)
```

### Should Not Trigger
```
帮我写代码
这个文本怎么格式化
纯文字内容
```