# Visual Content Detection Patterns Complete reference for detecting visual content in user inputs. ## File Extension Patterns ### Image Files (High Priority) ``` .png, .jpg, .jpeg, .gif, .bmp, .webp, .svg, .ico, .tiff, .tif, .heic, .heif, .raw, .psd, .ai, .eps ``` **Detection Rule**: Case-insensitive match anywhere in input ### Document Files with Visual Content (Medium Priority) ``` .pdf (may contain diagrams), .ppt, .pptx (slides with visuals), .vsdx (Visio), .drawio ``` **Detection Rule**: File extension + visual keywords ## Keyword Patterns ### Chinese Visual Keywords ``` 一级关键词(高优先级): 图片,图像,照片,截图,图表,图示,图形,影像,画面 二级关键词(中优先级): 流程图,架构图,时序图,ER 图,思维导图,柱状图,饼图,折线图 设计图,原型图,线框图,界面,UI,UX 表格,表单,清单,列表 三级关键词(低优先级): 显示,展示,呈现,可视化,看图,读图 ``` ### English Visual Keywords ``` High Priority: image, photo, picture, screenshot, snapshot, capture, diagram, chart, graph, plot, figure Medium Priority: flowchart, architecture, sequence diagram, ER diagram, mind map, bar chart, pie chart, line graph design, mockup, wireframe, interface, UI, UX, layout table, form, list, grid Low Priority: show, display, visualize, view, look at, see ``` ### Technical Visual Keywords ``` Schema, model, blueprint, spec, technical drawing Dashboard, widget, panel, visualization Map, heatmap, scatter plot, histogram Infographic, poster, banner, thumbnail ``` ## Pattern Matching Rules ### Rule 1: File Path + Extension ```regex [\w\-\.\/]+?\.(png|jpg|jpeg|gif|bmp|webp|svg|ico|tiff|heic) ``` **Action**: Immediate delegation to multimodal-looker ### Rule 2: Markdown Image Syntax ```regex !\[([^\]]*)\]\(([^\)]+)\) ``` **Action**: Extract alt text and URL, delegate to multimodal-looker ### Rule 3: Base64 Image Data ```regex data:image\/(png|jpeg|gif|webp);base64,[A-Za-z0-9+/=]+ ``` **Action**: Extract base64 data, save to temp file, delegate ### Rule 4: Keyword + File Reference ``` (图片 | 图像|diagram|chart|screenshot).*?[\w\-\.\/]+\.(png|jpg|jpeg|gif|bmp|webp) ``` **Action**: Confirm intent, then delegate ### Rule 5: Keyword Only (Ambiguous) ``` (帮我看看这个图 | 分析这张图片 | 这个图表显示) ``` **Action**: Ask for clarification: "请问是哪张图片?" ## Context-Aware Detection ### Code Development Context When user is working on code: - `architecture.png` → Architecture diagram - `screenshot.png` → Error or UI screenshot - `mockup.jpg` → Design reference **Action**: Assume technical visual, delegate with context ### Data Analysis Context When user mentions data: - `chart`, `graph`, `plot`, `visualization` - `sales_chart.png`, `trend_graph.jpg` **Action**: Assume data visualization, request data extraction ### Design Context When user discusses design: - `mockup`, `wireframe`, `prototype`, `design` - `ui_design.png`, `wireframe.jpg` **Action**: Assume design visual, request UI/UX analysis ## Detection Confidence Levels | Level | Confidence | Triggers | Action | |-------|------------|----------|--------| | HIGH | 90-100% | Image file + visual keyword | Auto-delegate | | MEDIUM | 60-89% | Image file OR strong keyword | Confirm then delegate | | LOW | 30-59% | Weak keyword only | Ask for clarification | | NONE | 0-29% | No visual signals | Process as text | ## Edge Cases ### Ambiguous References ``` "看这个" (without specifying what) "这个文件" (could be text or image) ``` **Handling**: Ask "请问是哪个文件?是图片吗?" ### Multiple Images ``` "比较这两张图:img1.png 和 img2.png" ``` **Handling**: Delegate both, request comparison ### Image in Code Block ```` ``` ![image](path.png) ``` ```` **Handling**: Still detect as visual content (user may be documenting) ### URL Images ``` https://example.com/image.png http://cdn.site.com/chart.jpg ``` **Handling**: Detect as visual, may need download first ## Implementation Checklist - [ ] Scan input for file extensions - [ ] Check for markdown image syntax - [ ] Search for visual keywords - [ ] Evaluate context (code, data, design) - [ ] Assign confidence level - [ ] Execute appropriate action (delegate/confirm/ask) ## Testing Examples ### Should Trigger (High Confidence) ``` 分析这个截图:error.png 看这张架构图 design/architecture.png ![流程图](flow.png) 显示什么? 帮我看看 data:image/png;base64,... ``` ### Should Trigger (Medium Confidence) ``` 这个图片怎么优化?screenshot.png diagram.jpg 有什么改进建议 ``` ### Should Ask (Low Confidence) ``` 帮我看看这个图 (no file specified) 这个设计怎么样?(unclear if visual attached) ``` ### Should Not Trigger ``` 帮我写代码 这个文本怎么格式化 纯文字内容 ```