- 70 skills with code and documentation - Add .gitignore (ignore __pycache__, output/, temp/, venv/) - Clean up test intermediates and caches
4.8 KiB
Visual Content Detection Patterns
Complete reference for detecting visual content in user inputs.
File Extension Patterns
Image Files (High Priority)
.png, .jpg, .jpeg, .gif, .bmp, .webp, .svg, .ico, .tiff, .tif, .heic, .heif, .raw, .psd, .ai, .eps
Detection Rule: Case-insensitive match anywhere in input
Document Files with Visual Content (Medium Priority)
.pdf (may contain diagrams), .ppt, .pptx (slides with visuals), .vsdx (Visio), .drawio
Detection Rule: File extension + visual keywords
Keyword Patterns
Chinese Visual Keywords
一级关键词(高优先级):
图片,图像,照片,截图,图表,图示,图形,影像,画面
二级关键词(中优先级):
流程图,架构图,时序图,ER 图,思维导图,柱状图,饼图,折线图
设计图,原型图,线框图,界面,UI,UX
表格,表单,清单,列表
三级关键词(低优先级):
显示,展示,呈现,可视化,看图,读图
English Visual Keywords
High Priority:
image, photo, picture, screenshot, snapshot, capture, diagram, chart, graph, plot, figure
Medium Priority:
flowchart, architecture, sequence diagram, ER diagram, mind map, bar chart, pie chart, line graph
design, mockup, wireframe, interface, UI, UX, layout
table, form, list, grid
Low Priority:
show, display, visualize, view, look at, see
Technical Visual Keywords
Schema, model, blueprint, spec, technical drawing
Dashboard, widget, panel, visualization
Map, heatmap, scatter plot, histogram
Infographic, poster, banner, thumbnail
Pattern Matching Rules
Rule 1: File Path + Extension
[\w\-\.\/]+?\.(png|jpg|jpeg|gif|bmp|webp|svg|ico|tiff|heic)
Action: Immediate delegation to multimodal-looker
Rule 2: Markdown Image Syntax
!\[([^\]]*)\]\(([^\)]+)\)
Action: Extract alt text and URL, delegate to multimodal-looker
Rule 3: Base64 Image Data
data:image\/(png|jpeg|gif|webp);base64,[A-Za-z0-9+/=]+
Action: Extract base64 data, save to temp file, delegate
Rule 4: Keyword + File Reference
(图片 | 图像|diagram|chart|screenshot).*?[\w\-\.\/]+\.(png|jpg|jpeg|gif|bmp|webp)
Action: Confirm intent, then delegate
Rule 5: Keyword Only (Ambiguous)
(帮我看看这个图 | 分析这张图片 | 这个图表显示)
Action: Ask for clarification: "请问是哪张图片?"
Context-Aware Detection
Code Development Context
When user is working on code:
architecture.png→ Architecture diagramscreenshot.png→ Error or UI screenshotmockup.jpg→ Design reference
Action: Assume technical visual, delegate with context
Data Analysis Context
When user mentions data:
chart,graph,plot,visualizationsales_chart.png,trend_graph.jpg
Action: Assume data visualization, request data extraction
Design Context
When user discusses design:
mockup,wireframe,prototype,designui_design.png,wireframe.jpg
Action: Assume design visual, request UI/UX analysis
Detection Confidence Levels
| Level | Confidence | Triggers | Action |
|---|---|---|---|
| HIGH | 90-100% | Image file + visual keyword | Auto-delegate |
| MEDIUM | 60-89% | Image file OR strong keyword | Confirm then delegate |
| LOW | 30-59% | Weak keyword only | Ask for clarification |
| NONE | 0-29% | No visual signals | Process as text |
Edge Cases
Ambiguous References
"看这个" (without specifying what)
"这个文件" (could be text or image)
Handling: Ask "请问是哪个文件?是图片吗?"
Multiple Images
"比较这两张图:img1.png 和 img2.png"
Handling: Delegate both, request comparison
Image in Code Block
```

```
Handling: Still detect as visual content (user may be documenting)
URL Images
https://example.com/image.png
http://cdn.site.com/chart.jpg
Handling: Detect as visual, may need download first
Implementation Checklist
- Scan input for file extensions
- Check for markdown image syntax
- Search for visual keywords
- Evaluate context (code, data, design)
- Assign confidence level
- Execute appropriate action (delegate/confirm/ask)
Testing Examples
Should Trigger (High Confidence)
分析这个截图:error.png
看这张架构图 design/architecture.png
 显示什么?
帮我看看 data:image/png;base64,...
Should Trigger (Medium Confidence)
这个图片怎么优化?screenshot.png
diagram.jpg 有什么改进建议
Should Ask (Low Confidence)
帮我看看这个图 (no file specified)
这个设计怎么样?(unclear if visual attached)
Should Not Trigger
帮我写代码
这个文本怎么格式化
纯文字内容