Files
skills/agent-vision-awareness/references/detection-patterns.md
T
hmo 04db423416 Initial commit: skills library
- 70 skills with code and documentation
- Add .gitignore (ignore __pycache__, output/, temp/, venv/)
- Clean up test intermediates and caches
2026-04-26 19:27:40 +08:00

4.8 KiB

Visual Content Detection Patterns

Complete reference for detecting visual content in user inputs.

File Extension Patterns

Image Files (High Priority)

.png, .jpg, .jpeg, .gif, .bmp, .webp, .svg, .ico, .tiff, .tif, .heic, .heif, .raw, .psd, .ai, .eps

Detection Rule: Case-insensitive match anywhere in input

Document Files with Visual Content (Medium Priority)

.pdf (may contain diagrams), .ppt, .pptx (slides with visuals), .vsdx (Visio), .drawio

Detection Rule: File extension + visual keywords

Keyword Patterns

Chinese Visual Keywords

一级关键词(高优先级):
图片,图像,照片,截图,图表,图示,图形,影像,画面

二级关键词(中优先级):
流程图,架构图,时序图,ER 图,思维导图,柱状图,饼图,折线图
设计图,原型图,线框图,界面,UI,UX
表格,表单,清单,列表

三级关键词(低优先级):
显示,展示,呈现,可视化,看图,读图

English Visual Keywords

High Priority:
image, photo, picture, screenshot, snapshot, capture, diagram, chart, graph, plot, figure

Medium Priority:
flowchart, architecture, sequence diagram, ER diagram, mind map, bar chart, pie chart, line graph
design, mockup, wireframe, interface, UI, UX, layout
table, form, list, grid

Low Priority:
show, display, visualize, view, look at, see

Technical Visual Keywords

Schema, model, blueprint, spec, technical drawing
Dashboard, widget, panel, visualization
Map, heatmap, scatter plot, histogram
Infographic, poster, banner, thumbnail

Pattern Matching Rules

Rule 1: File Path + Extension

[\w\-\.\/]+?\.(png|jpg|jpeg|gif|bmp|webp|svg|ico|tiff|heic)

Action: Immediate delegation to multimodal-looker

Rule 2: Markdown Image Syntax

!\[([^\]]*)\]\(([^\)]+)\)

Action: Extract alt text and URL, delegate to multimodal-looker

Rule 3: Base64 Image Data

data:image\/(png|jpeg|gif|webp);base64,[A-Za-z0-9+/=]+

Action: Extract base64 data, save to temp file, delegate

Rule 4: Keyword + File Reference

(图片 | 图像|diagram|chart|screenshot).*?[\w\-\.\/]+\.(png|jpg|jpeg|gif|bmp|webp)

Action: Confirm intent, then delegate

Rule 5: Keyword Only (Ambiguous)

(帮我看看这个图 | 分析这张图片 | 这个图表显示)

Action: Ask for clarification: "请问是哪张图片?"

Context-Aware Detection

Code Development Context

When user is working on code:

  • architecture.png → Architecture diagram
  • screenshot.png → Error or UI screenshot
  • mockup.jpg → Design reference

Action: Assume technical visual, delegate with context

Data Analysis Context

When user mentions data:

  • chart, graph, plot, visualization
  • sales_chart.png, trend_graph.jpg

Action: Assume data visualization, request data extraction

Design Context

When user discusses design:

  • mockup, wireframe, prototype, design
  • ui_design.png, wireframe.jpg

Action: Assume design visual, request UI/UX analysis

Detection Confidence Levels

Level Confidence Triggers Action
HIGH 90-100% Image file + visual keyword Auto-delegate
MEDIUM 60-89% Image file OR strong keyword Confirm then delegate
LOW 30-59% Weak keyword only Ask for clarification
NONE 0-29% No visual signals Process as text

Edge Cases

Ambiguous References

"看这个" (without specifying what)
"这个文件" (could be text or image)

Handling: Ask "请问是哪个文件?是图片吗?"

Multiple Images

"比较这两张图:img1.png 和 img2.png"

Handling: Delegate both, request comparison

Image in Code Block

```
![image](path.png)
```

Handling: Still detect as visual content (user may be documenting)

URL Images

https://example.com/image.png
http://cdn.site.com/chart.jpg

Handling: Detect as visual, may need download first

Implementation Checklist

  • Scan input for file extensions
  • Check for markdown image syntax
  • Search for visual keywords
  • Evaluate context (code, data, design)
  • Assign confidence level
  • Execute appropriate action (delegate/confirm/ask)

Testing Examples

Should Trigger (High Confidence)

分析这个截图:error.png
看这张架构图 design/architecture.png
![流程图](flow.png) 显示什么?
帮我看看 data:image/png;base64,...

Should Trigger (Medium Confidence)

这个图片怎么优化?screenshot.png
diagram.jpg 有什么改进建议

Should Ask (Low Confidence)

帮我看看这个图 (no file specified)
这个设计怎么样?(unclear if visual attached)

Should Not Trigger

帮我写代码
这个文本怎么格式化
纯文字内容