Initial commit: skills library
- 70 skills with code and documentation - Add .gitignore (ignore __pycache__, output/, temp/, venv/) - Clean up test intermediates and caches
This commit is contained in:
@@ -0,0 +1,195 @@
|
||||
# Visual Content Detection Patterns
|
||||
|
||||
Complete reference for detecting visual content in user inputs.
|
||||
|
||||
## File Extension Patterns
|
||||
|
||||
### Image Files (High Priority)
|
||||
```
|
||||
.png, .jpg, .jpeg, .gif, .bmp, .webp, .svg, .ico, .tiff, .tif, .heic, .heif, .raw, .psd, .ai, .eps
|
||||
```
|
||||
|
||||
**Detection Rule**: Case-insensitive match anywhere in input
|
||||
|
||||
### Document Files with Visual Content (Medium Priority)
|
||||
```
|
||||
.pdf (may contain diagrams), .ppt, .pptx (slides with visuals), .vsdx (Visio), .drawio
|
||||
```
|
||||
|
||||
**Detection Rule**: File extension + visual keywords
|
||||
|
||||
## Keyword Patterns
|
||||
|
||||
### Chinese Visual Keywords
|
||||
```
|
||||
一级关键词(高优先级):
|
||||
图片,图像,照片,截图,图表,图示,图形,影像,画面
|
||||
|
||||
二级关键词(中优先级):
|
||||
流程图,架构图,时序图,ER 图,思维导图,柱状图,饼图,折线图
|
||||
设计图,原型图,线框图,界面,UI,UX
|
||||
表格,表单,清单,列表
|
||||
|
||||
三级关键词(低优先级):
|
||||
显示,展示,呈现,可视化,看图,读图
|
||||
```
|
||||
|
||||
### English Visual Keywords
|
||||
```
|
||||
High Priority:
|
||||
image, photo, picture, screenshot, snapshot, capture, diagram, chart, graph, plot, figure
|
||||
|
||||
Medium Priority:
|
||||
flowchart, architecture, sequence diagram, ER diagram, mind map, bar chart, pie chart, line graph
|
||||
design, mockup, wireframe, interface, UI, UX, layout
|
||||
table, form, list, grid
|
||||
|
||||
Low Priority:
|
||||
show, display, visualize, view, look at, see
|
||||
```
|
||||
|
||||
### Technical Visual Keywords
|
||||
```
|
||||
Schema, model, blueprint, spec, technical drawing
|
||||
Dashboard, widget, panel, visualization
|
||||
Map, heatmap, scatter plot, histogram
|
||||
Infographic, poster, banner, thumbnail
|
||||
```
|
||||
|
||||
## Pattern Matching Rules
|
||||
|
||||
### Rule 1: File Path + Extension
|
||||
```regex
|
||||
[\w\-\.\/]+?\.(png|jpg|jpeg|gif|bmp|webp|svg|ico|tiff|heic)
|
||||
```
|
||||
|
||||
**Action**: Immediate delegation to multimodal-looker
|
||||
|
||||
### Rule 2: Markdown Image Syntax
|
||||
```regex
|
||||
!\[([^\]]*)\]\(([^\)]+)\)
|
||||
```
|
||||
|
||||
**Action**: Extract alt text and URL, delegate to multimodal-looker
|
||||
|
||||
### Rule 3: Base64 Image Data
|
||||
```regex
|
||||
data:image\/(png|jpeg|gif|webp);base64,[A-Za-z0-9+/=]+
|
||||
```
|
||||
|
||||
**Action**: Extract base64 data, save to temp file, delegate
|
||||
|
||||
### Rule 4: Keyword + File Reference
|
||||
```
|
||||
(图片 | 图像|diagram|chart|screenshot).*?[\w\-\.\/]+\.(png|jpg|jpeg|gif|bmp|webp)
|
||||
```
|
||||
|
||||
**Action**: Confirm intent, then delegate
|
||||
|
||||
### Rule 5: Keyword Only (Ambiguous)
|
||||
```
|
||||
(帮我看看这个图 | 分析这张图片 | 这个图表显示)
|
||||
```
|
||||
|
||||
**Action**: Ask for clarification: "请问是哪张图片?"
|
||||
|
||||
## Context-Aware Detection
|
||||
|
||||
### Code Development Context
|
||||
When user is working on code:
|
||||
- `architecture.png` → Architecture diagram
|
||||
- `screenshot.png` → Error or UI screenshot
|
||||
- `mockup.jpg` → Design reference
|
||||
|
||||
**Action**: Assume technical visual, delegate with context
|
||||
|
||||
### Data Analysis Context
|
||||
When user mentions data:
|
||||
- `chart`, `graph`, `plot`, `visualization`
|
||||
- `sales_chart.png`, `trend_graph.jpg`
|
||||
|
||||
**Action**: Assume data visualization, request data extraction
|
||||
|
||||
### Design Context
|
||||
When user discusses design:
|
||||
- `mockup`, `wireframe`, `prototype`, `design`
|
||||
- `ui_design.png`, `wireframe.jpg`
|
||||
|
||||
**Action**: Assume design visual, request UI/UX analysis
|
||||
|
||||
## Detection Confidence Levels
|
||||
|
||||
| Level | Confidence | Triggers | Action |
|
||||
|-------|------------|----------|--------|
|
||||
| HIGH | 90-100% | Image file + visual keyword | Auto-delegate |
|
||||
| MEDIUM | 60-89% | Image file OR strong keyword | Confirm then delegate |
|
||||
| LOW | 30-59% | Weak keyword only | Ask for clarification |
|
||||
| NONE | 0-29% | No visual signals | Process as text |
|
||||
|
||||
## Edge Cases
|
||||
|
||||
### Ambiguous References
|
||||
```
|
||||
"看这个" (without specifying what)
|
||||
"这个文件" (could be text or image)
|
||||
```
|
||||
**Handling**: Ask "请问是哪个文件?是图片吗?"
|
||||
|
||||
### Multiple Images
|
||||
```
|
||||
"比较这两张图:img1.png 和 img2.png"
|
||||
```
|
||||
**Handling**: Delegate both, request comparison
|
||||
|
||||
### Image in Code Block
|
||||
````
|
||||
```
|
||||

|
||||
```
|
||||
````
|
||||
**Handling**: Still detect as visual content (user may be documenting)
|
||||
|
||||
### URL Images
|
||||
```
|
||||
https://example.com/image.png
|
||||
http://cdn.site.com/chart.jpg
|
||||
```
|
||||
**Handling**: Detect as visual, may need download first
|
||||
|
||||
## Implementation Checklist
|
||||
|
||||
- [ ] Scan input for file extensions
|
||||
- [ ] Check for markdown image syntax
|
||||
- [ ] Search for visual keywords
|
||||
- [ ] Evaluate context (code, data, design)
|
||||
- [ ] Assign confidence level
|
||||
- [ ] Execute appropriate action (delegate/confirm/ask)
|
||||
|
||||
## Testing Examples
|
||||
|
||||
### Should Trigger (High Confidence)
|
||||
```
|
||||
分析这个截图:error.png
|
||||
看这张架构图 design/architecture.png
|
||||
 显示什么?
|
||||
帮我看看 data:image/png;base64,...
|
||||
```
|
||||
|
||||
### Should Trigger (Medium Confidence)
|
||||
```
|
||||
这个图片怎么优化?screenshot.png
|
||||
diagram.jpg 有什么改进建议
|
||||
```
|
||||
|
||||
### Should Ask (Low Confidence)
|
||||
```
|
||||
帮我看看这个图 (no file specified)
|
||||
这个设计怎么样?(unclear if visual attached)
|
||||
```
|
||||
|
||||
### Should Not Trigger
|
||||
```
|
||||
帮我写代码
|
||||
这个文本怎么格式化
|
||||
纯文字内容
|
||||
```
|
||||
Reference in New Issue
Block a user