04db423416
- 70 skills with code and documentation - Add .gitignore (ignore __pycache__, output/, temp/, venv/) - Clean up test intermediates and caches
196 lines
4.8 KiB
Markdown
196 lines
4.8 KiB
Markdown
# Visual Content Detection Patterns
|
|
|
|
Complete reference for detecting visual content in user inputs.
|
|
|
|
## File Extension Patterns
|
|
|
|
### Image Files (High Priority)
|
|
```
|
|
.png, .jpg, .jpeg, .gif, .bmp, .webp, .svg, .ico, .tiff, .tif, .heic, .heif, .raw, .psd, .ai, .eps
|
|
```
|
|
|
|
**Detection Rule**: Case-insensitive match anywhere in input
|
|
|
|
### Document Files with Visual Content (Medium Priority)
|
|
```
|
|
.pdf (may contain diagrams), .ppt, .pptx (slides with visuals), .vsdx (Visio), .drawio
|
|
```
|
|
|
|
**Detection Rule**: File extension + visual keywords
|
|
|
|
## Keyword Patterns
|
|
|
|
### Chinese Visual Keywords
|
|
```
|
|
一级关键词(高优先级):
|
|
图片,图像,照片,截图,图表,图示,图形,影像,画面
|
|
|
|
二级关键词(中优先级):
|
|
流程图,架构图,时序图,ER 图,思维导图,柱状图,饼图,折线图
|
|
设计图,原型图,线框图,界面,UI,UX
|
|
表格,表单,清单,列表
|
|
|
|
三级关键词(低优先级):
|
|
显示,展示,呈现,可视化,看图,读图
|
|
```
|
|
|
|
### English Visual Keywords
|
|
```
|
|
High Priority:
|
|
image, photo, picture, screenshot, snapshot, capture, diagram, chart, graph, plot, figure
|
|
|
|
Medium Priority:
|
|
flowchart, architecture, sequence diagram, ER diagram, mind map, bar chart, pie chart, line graph
|
|
design, mockup, wireframe, interface, UI, UX, layout
|
|
table, form, list, grid
|
|
|
|
Low Priority:
|
|
show, display, visualize, view, look at, see
|
|
```
|
|
|
|
### Technical Visual Keywords
|
|
```
|
|
Schema, model, blueprint, spec, technical drawing
|
|
Dashboard, widget, panel, visualization
|
|
Map, heatmap, scatter plot, histogram
|
|
Infographic, poster, banner, thumbnail
|
|
```
|
|
|
|
## Pattern Matching Rules
|
|
|
|
### Rule 1: File Path + Extension
|
|
```regex
|
|
[\w\-\.\/]+?\.(png|jpg|jpeg|gif|bmp|webp|svg|ico|tiff|heic)
|
|
```
|
|
|
|
**Action**: Immediate delegation to multimodal-looker
|
|
|
|
### Rule 2: Markdown Image Syntax
|
|
```regex
|
|
!\[([^\]]*)\]\(([^\)]+)\)
|
|
```
|
|
|
|
**Action**: Extract alt text and URL, delegate to multimodal-looker
|
|
|
|
### Rule 3: Base64 Image Data
|
|
```regex
|
|
data:image\/(png|jpeg|gif|webp);base64,[A-Za-z0-9+/=]+
|
|
```
|
|
|
|
**Action**: Extract base64 data, save to temp file, delegate
|
|
|
|
### Rule 4: Keyword + File Reference
|
|
```
|
|
(图片 | 图像|diagram|chart|screenshot).*?[\w\-\.\/]+\.(png|jpg|jpeg|gif|bmp|webp)
|
|
```
|
|
|
|
**Action**: Confirm intent, then delegate
|
|
|
|
### Rule 5: Keyword Only (Ambiguous)
|
|
```
|
|
(帮我看看这个图 | 分析这张图片 | 这个图表显示)
|
|
```
|
|
|
|
**Action**: Ask for clarification: "请问是哪张图片?"
|
|
|
|
## Context-Aware Detection
|
|
|
|
### Code Development Context
|
|
When user is working on code:
|
|
- `architecture.png` → Architecture diagram
|
|
- `screenshot.png` → Error or UI screenshot
|
|
- `mockup.jpg` → Design reference
|
|
|
|
**Action**: Assume technical visual, delegate with context
|
|
|
|
### Data Analysis Context
|
|
When user mentions data:
|
|
- `chart`, `graph`, `plot`, `visualization`
|
|
- `sales_chart.png`, `trend_graph.jpg`
|
|
|
|
**Action**: Assume data visualization, request data extraction
|
|
|
|
### Design Context
|
|
When user discusses design:
|
|
- `mockup`, `wireframe`, `prototype`, `design`
|
|
- `ui_design.png`, `wireframe.jpg`
|
|
|
|
**Action**: Assume design visual, request UI/UX analysis
|
|
|
|
## Detection Confidence Levels
|
|
|
|
| Level | Confidence | Triggers | Action |
|
|
|-------|------------|----------|--------|
|
|
| HIGH | 90-100% | Image file + visual keyword | Auto-delegate |
|
|
| MEDIUM | 60-89% | Image file OR strong keyword | Confirm then delegate |
|
|
| LOW | 30-59% | Weak keyword only | Ask for clarification |
|
|
| NONE | 0-29% | No visual signals | Process as text |
|
|
|
|
## Edge Cases
|
|
|
|
### Ambiguous References
|
|
```
|
|
"看这个" (without specifying what)
|
|
"这个文件" (could be text or image)
|
|
```
|
|
**Handling**: Ask "请问是哪个文件?是图片吗?"
|
|
|
|
### Multiple Images
|
|
```
|
|
"比较这两张图:img1.png 和 img2.png"
|
|
```
|
|
**Handling**: Delegate both, request comparison
|
|
|
|
### Image in Code Block
|
|
````
|
|
```
|
|

|
|
```
|
|
````
|
|
**Handling**: Still detect as visual content (user may be documenting)
|
|
|
|
### URL Images
|
|
```
|
|
https://example.com/image.png
|
|
http://cdn.site.com/chart.jpg
|
|
```
|
|
**Handling**: Detect as visual, may need download first
|
|
|
|
## Implementation Checklist
|
|
|
|
- [ ] Scan input for file extensions
|
|
- [ ] Check for markdown image syntax
|
|
- [ ] Search for visual keywords
|
|
- [ ] Evaluate context (code, data, design)
|
|
- [ ] Assign confidence level
|
|
- [ ] Execute appropriate action (delegate/confirm/ask)
|
|
|
|
## Testing Examples
|
|
|
|
### Should Trigger (High Confidence)
|
|
```
|
|
分析这个截图:error.png
|
|
看这张架构图 design/architecture.png
|
|
 显示什么?
|
|
帮我看看 data:image/png;base64,...
|
|
```
|
|
|
|
### Should Trigger (Medium Confidence)
|
|
```
|
|
这个图片怎么优化?screenshot.png
|
|
diagram.jpg 有什么改进建议
|
|
```
|
|
|
|
### Should Ask (Low Confidence)
|
|
```
|
|
帮我看看这个图 (no file specified)
|
|
这个设计怎么样?(unclear if visual attached)
|
|
```
|
|
|
|
### Should Not Trigger
|
|
```
|
|
帮我写代码
|
|
这个文本怎么格式化
|
|
纯文字内容
|
|
```
|