新增16个AI技能:包含图像生成、视频剪辑、数据分析、智能查询等功能模块
This commit is contained in:
27
.opencode/skills/image-service/README.md
Normal file
27
.opencode/skills/image-service/README.md
Normal file
@@ -0,0 +1,27 @@
|
||||
# Image Service
|
||||
|
||||
图像生成/编辑/分析服务。
|
||||
|
||||
## 依赖
|
||||
|
||||
```bash
|
||||
pip install httpx pillow numpy
|
||||
```
|
||||
|
||||
## 配置
|
||||
|
||||
编辑 `config/settings.json` 或设置环境变量:
|
||||
|
||||
```bash
|
||||
export IMAGE_API_KEY="your_key"
|
||||
export IMAGE_API_BASE_URL="https://api.openai.com/v1"
|
||||
export VISION_API_KEY="your_key"
|
||||
export VISION_API_BASE_URL="https://api.openai.com/v1"
|
||||
```
|
||||
|
||||
## 功能
|
||||
|
||||
- 文生图 (text_to_image.py)
|
||||
- 图生图 (image_to_image.py)
|
||||
- 图片理解 (image_to_text.py)
|
||||
- 长图拼接 (merge_long_image.py)
|
||||
132
.opencode/skills/image-service/SKILL.md
Normal file
132
.opencode/skills/image-service/SKILL.md
Normal file
@@ -0,0 +1,132 @@
|
||||
---
|
||||
name: image-service
|
||||
description: 多模态图像处理技能,支持文生图、图生图、图生文、长图拼接。当用户提到图片、图像、生成图、信息图、OCR 等关键词时触发。
|
||||
---
|
||||
|
||||
# 图像处理技能
|
||||
|
||||
## 概述
|
||||
|
||||
| 能力 | 说明 | 脚本 |
|
||||
|-----|------|------|
|
||||
| 文生图 | 根据中文文本描述生成图片 | `scripts/text_to_image.py` |
|
||||
| 图生图 | 在已有图片基础上进行编辑 | `scripts/image_to_image.py` |
|
||||
| 图生文 | 分析图片内容(描述、OCR、图表等) | `scripts/image_to_text.py` |
|
||||
| 长图拼接 | 将多张图片垂直拼接为微信长图 | `scripts/merge_long_image.py` |
|
||||
| 调研配图 | 预设手绘风格的调研报告信息图 | `scripts/research_image.py` |
|
||||
|
||||
## 配置
|
||||
|
||||
配置文件:`config/settings.json`
|
||||
|
||||
| 配置项 | 值 |
|
||||
|-------|-----|
|
||||
| IMAGE_API_BASE_URL | `${IMAGE_API_BASE_URL}` |
|
||||
| IMAGE_MODEL | `lyra-flash-9` |
|
||||
| VISION_MODEL | `qwen2.5-vl-72b-instruct` |
|
||||
|
||||
## 执行规范
|
||||
|
||||
**图片默认保存到命令执行时的当前工作目录**:
|
||||
|
||||
1. **不要**使用 `workdir` 切换到 skill 目录执行命令
|
||||
2. **始终**在用户的工作目录下执行,使用脚本的绝对路径
|
||||
3. 脚本路径:`.opencode/skills/image-service/scripts/`
|
||||
|
||||
```bash
|
||||
# 正确示例
|
||||
python .opencode/skills/image-service/scripts/text_to_image.py "描述" -r 3:4 -o output.png
|
||||
```
|
||||
|
||||
## 快速使用
|
||||
|
||||
### 文生图
|
||||
|
||||
```bash
|
||||
python .opencode/skills/image-service/scripts/text_to_image.py "信息图风格,标题:AI技术趋势" -r 16:9
|
||||
python .opencode/skills/image-service/scripts/text_to_image.py "竖版海报,产品展示" -r 3:4 -o poster.png
|
||||
```
|
||||
|
||||
参数:`-r` 宽高比 | `-s` 尺寸 | `-o` 输出路径
|
||||
|
||||
支持比例:`1:1`, `2:3`, `3:2`, `3:4`, `4:3`, `4:5`, `5:4`, `9:16`, `16:9`, `21:9`
|
||||
|
||||
### 图生图
|
||||
|
||||
```bash
|
||||
python .opencode/skills/image-service/scripts/image_to_image.py input.png "编辑描述" -r 3:4
|
||||
```
|
||||
|
||||
### 图生文
|
||||
|
||||
```bash
|
||||
python .opencode/skills/image-service/scripts/image_to_text.py image.jpg -m describe
|
||||
python .opencode/skills/image-service/scripts/image_to_text.py screenshot.png -m ocr
|
||||
```
|
||||
|
||||
模式:`describe` | `ocr` | `chart` | `fashion` | `product` | `scene`
|
||||
|
||||
### 长图拼接
|
||||
|
||||
```bash
|
||||
python .opencode/skills/image-service/scripts/merge_long_image.py img1.png img2.png -o output.png --blend 20
|
||||
python .opencode/skills/image-service/scripts/merge_long_image.py -p "*.png" -o long.png --sort name
|
||||
```
|
||||
|
||||
参数:`-p` 通配符 | `-o` 输出 | `-w` 宽度 | `-g` 间隔 | `--blend` 融合 | `--sort` 排序
|
||||
|
||||
### 调研配图
|
||||
|
||||
```bash
|
||||
python .opencode/skills/image-service/scripts/research_image.py -t arch -n "标题" -c "内容" -o output.png
|
||||
```
|
||||
|
||||
类型:`arch` 架构图 | `flow` 流程图 | `compare` 对比图 | `concept` 概念图
|
||||
|
||||
## 执行前必做:需求类型判断(铁律)
|
||||
|
||||
**收到图片生成需求后,必须先判断是哪种类型,再决定执行方式:**
|
||||
|
||||
### 长图识别规则
|
||||
|
||||
提示词中出现以下任一特征,即判定为**长图需求**:
|
||||
|
||||
| 特征类型 | 识别关键词/模式 |
|
||||
|---------|---------------|
|
||||
| **明确声明** | 长图、长图海报、垂直长图、微信长图、Infographic、Long Banner |
|
||||
| **分段结构** | 提示词包含多个段落(如"第1部分"、"顶部"、"中间"、"底部")|
|
||||
| **编号列表** | 使用 `### 1.`、`### 2.` 等编号分段 |
|
||||
| **多屏内容** | 描述了3个及以上独立画面/模块 |
|
||||
| **从上至下** | 出现"从上至下"、"从上到下"等描述 |
|
||||
|
||||
### 判断后的执行路径
|
||||
|
||||
```
|
||||
识别为长图 → 必须先读取 references/long-image-guide.md → 按长图流程执行
|
||||
识别为单图 → 直接使用 text_to_image.py 生成
|
||||
```
|
||||
|
||||
**铁律:识别为长图后,禁止直接生成!必须先加载长图指南,按指南流程执行。**
|
||||
|
||||
## 详细指南(按需加载)
|
||||
|
||||
| 场景 | 触发条件 | 参考文档 |
|
||||
|------|---------|---------|
|
||||
| 生成多屏长图 | 命中上述长图识别规则 | `references/long-image-guide.md`(必须加载)|
|
||||
| 图片含中文文字 | 提示词要求图片包含中文标题/文字 | `references/text-rendering-guide.md` |
|
||||
| 为 PPT/文档配图 | 用户提供了配色要求或参考文档 | `references/color-sync-guide.md` |
|
||||
| API 接口细节 | 需要了解底层实现 | `docs/api-reference.md` |
|
||||
| 提示词技巧 | 需要优化提示词效果 | `docs/prompt-guide.md` |
|
||||
|
||||
## 提示词要点
|
||||
|
||||
1. **必须使用中文**撰写提示词
|
||||
2. 图片中的标题、标签**必须为中文**
|
||||
3. 默认宽高比 **16:9**,可通过 `-r` 参数调整
|
||||
4. 推荐风格:信息图、数据可视化、手绘文字、科技插画
|
||||
|
||||
## 触发关键词
|
||||
|
||||
- **生成类**:生成图片、创建图片、文生图、图生图、信息图、数据可视化
|
||||
- **分析类**:分析图片、OCR、识别文字、图生文
|
||||
- **拼接类**:长图、微信长图、拼接图片
|
||||
42
.opencode/skills/image-service/config/settings.json
Normal file
42
.opencode/skills/image-service/config/settings.json
Normal file
@@ -0,0 +1,42 @@
|
||||
{
|
||||
"image_api": {
|
||||
"key": "your_image_api_key",
|
||||
"base_url": "https://api.openai.com/v1",
|
||||
"model": "dall-e-3"
|
||||
},
|
||||
"vision_api": {
|
||||
"key": "your_vision_api_key",
|
||||
"base_url": "https://api.openai.com/v1",
|
||||
"model": "gpt-4o"
|
||||
},
|
||||
"defaults": {
|
||||
"text_to_image": {
|
||||
"size": "1792x1024",
|
||||
"response_format": "b64_json"
|
||||
},
|
||||
"image_to_image": {
|
||||
"size": "1792x1024",
|
||||
"response_format": "b64_json"
|
||||
},
|
||||
"image_to_text": {
|
||||
"max_tokens": 2000,
|
||||
"temperature": 0.7,
|
||||
"mode": "describe"
|
||||
}
|
||||
},
|
||||
"limits": {
|
||||
"max_file_size_mb": 4,
|
||||
"supported_formats": ["png", "jpg", "jpeg", "webp", "gif"],
|
||||
"max_prompt_length": 1000,
|
||||
"timeout_seconds": {
|
||||
"text_to_image": 180,
|
||||
"image_to_image": 180,
|
||||
"image_to_text": 120
|
||||
}
|
||||
},
|
||||
"retry": {
|
||||
"max_attempts": 3,
|
||||
"backoff_multiplier": 2,
|
||||
"initial_delay_seconds": 1
|
||||
}
|
||||
}
|
||||
233
.opencode/skills/image-service/docs/api-reference.md
Normal file
233
.opencode/skills/image-service/docs/api-reference.md
Normal file
@@ -0,0 +1,233 @@
|
||||
# API 参考文档
|
||||
|
||||
## 概述
|
||||
|
||||
本技能使用两套 API:
|
||||
1. **Lyra Flash API** - 用于图像生成和编辑(文生图、图生图)
|
||||
2. **Qwen2.5-VL API** - 用于视觉识别(图生文)
|
||||
|
||||
---
|
||||
|
||||
## 一、Lyra Flash API(图像生成)
|
||||
|
||||
### 1.1 基础配置
|
||||
|
||||
| 配置项 | 值 |
|
||||
|-------|-----|
|
||||
| Base URL | `${IMAGE_API_BASE_URL}` |
|
||||
| Model | `lyra-flash-9` |
|
||||
| 认证方式 | Bearer Token |
|
||||
|
||||
### 1.2 文生图接口
|
||||
|
||||
**端点**
|
||||
```
|
||||
POST /images/generations
|
||||
```
|
||||
|
||||
**请求头**
|
||||
```json
|
||||
{
|
||||
"Content-Type": "application/json",
|
||||
"Authorization": "Bearer ${IMAGE_API_KEY}"
|
||||
}
|
||||
```
|
||||
|
||||
**请求体**
|
||||
```json
|
||||
{
|
||||
"model": "lyra-flash-9",
|
||||
"prompt": "中文图像描述",
|
||||
"size": "1792x1024",
|
||||
"response_format": "b64_json"
|
||||
}
|
||||
```
|
||||
|
||||
**参数说明**
|
||||
|
||||
| 参数 | 类型 | 必填 | 说明 |
|
||||
|-----|------|-----|------|
|
||||
| model | string | 是 | 固定使用 `lyra-flash-9` |
|
||||
| prompt | string | 是 | 中文图像生成提示词 |
|
||||
| size | string | 否 | 图片尺寸,默认 `1792x1024` |
|
||||
| response_format | string | 否 | 响应格式,推荐 `b64_json` |
|
||||
|
||||
**响应体**
|
||||
```json
|
||||
{
|
||||
"created": 1641234567,
|
||||
"data": [
|
||||
{
|
||||
"b64_json": "base64编码的图片数据"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### 1.3 图生图接口
|
||||
|
||||
**端点**
|
||||
```
|
||||
POST /images/edits
|
||||
```
|
||||
|
||||
**请求体**
|
||||
```json
|
||||
{
|
||||
"model": "lyra-flash-9",
|
||||
"prompt": "中文编辑指令",
|
||||
"image": "data:image/png;base64,{base64数据}",
|
||||
"size": "1792x1024",
|
||||
"response_format": "b64_json"
|
||||
}
|
||||
```
|
||||
|
||||
**参数说明**
|
||||
|
||||
| 参数 | 类型 | 必填 | 说明 |
|
||||
|-----|------|-----|------|
|
||||
| model | string | 是 | 固定使用 `lyra-flash-9` |
|
||||
| prompt | string | 是 | 中文图片编辑指令 |
|
||||
| image | string | 是 | Base64 编码的参考图片(含 data URL 前缀) |
|
||||
| size | string | 否 | 输出尺寸 |
|
||||
| response_format | string | 否 | 响应格式 |
|
||||
|
||||
**响应体**
|
||||
```json
|
||||
{
|
||||
"data": [
|
||||
{
|
||||
"b64_json": "base64编码的生成图片"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 二、Qwen2.5-VL API(视觉识别)
|
||||
|
||||
### 2.1 基础配置
|
||||
|
||||
| 配置项 | 值 |
|
||||
|-------|-----|
|
||||
| Base URL | `${IMAGE_API_BASE_URL}` |
|
||||
| Model | `qwen2.5-vl-72b-instruct` |
|
||||
| 认证方式 | Bearer Token |
|
||||
|
||||
### 2.2 图生文接口
|
||||
|
||||
**端点**
|
||||
```
|
||||
POST /chat/completions
|
||||
```
|
||||
|
||||
**请求头**
|
||||
```json
|
||||
{
|
||||
"Content-Type": "application/json",
|
||||
"Authorization": "Bearer ${VISION_API_KEY}"
|
||||
}
|
||||
```
|
||||
|
||||
**请求体**
|
||||
```json
|
||||
{
|
||||
"model": "qwen2.5-vl-72b-instruct",
|
||||
"messages": [
|
||||
{
|
||||
"role": "user",
|
||||
"content": [
|
||||
{
|
||||
"type": "text",
|
||||
"text": "请描述这张图片"
|
||||
},
|
||||
{
|
||||
"type": "image_url",
|
||||
"image_url": {
|
||||
"url": "data:image/jpeg;base64,{base64数据}"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"max_tokens": 2000,
|
||||
"temperature": 0.7
|
||||
}
|
||||
```
|
||||
|
||||
**参数说明**
|
||||
|
||||
| 参数 | 类型 | 必填 | 说明 |
|
||||
|-----|------|-----|------|
|
||||
| model | string | 是 | 视觉模型名称 |
|
||||
| messages | array | 是 | 消息列表,包含文本和图片 |
|
||||
| max_tokens | int | 否 | 最大输出 token 数 |
|
||||
| temperature | float | 否 | 温度参数(0-1) |
|
||||
|
||||
**响应体**
|
||||
```json
|
||||
{
|
||||
"id": "chatcmpl-xxx",
|
||||
"object": "chat.completion",
|
||||
"created": 1641234567,
|
||||
"choices": [
|
||||
{
|
||||
"index": 0,
|
||||
"message": {
|
||||
"role": "assistant",
|
||||
"content": "这是一张..."
|
||||
},
|
||||
"finish_reason": "stop"
|
||||
}
|
||||
],
|
||||
"usage": {
|
||||
"prompt_tokens": 100,
|
||||
"completion_tokens": 50,
|
||||
"total_tokens": 150
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 三、错误码说明
|
||||
|
||||
| 状态码 | 说明 | 处理建议 |
|
||||
|-------|------|---------|
|
||||
| 400 | 请求参数错误 | 检查请求体格式和参数 |
|
||||
| 401 | API 密钥无效 | 检查 API Key 是否正确 |
|
||||
| 403 | 权限不足 | 检查 API Key 权限 |
|
||||
| 429 | 请求频率限制 | 等待后重试 |
|
||||
| 500 | 服务器内部错误 | 稍后重试 |
|
||||
| 503 | 服务不可用 | 稍后重试 |
|
||||
|
||||
---
|
||||
|
||||
## 四、最佳实践
|
||||
|
||||
### 4.1 超时设置
|
||||
|
||||
- 文生图:建议 120-180 秒
|
||||
- 图生图:建议 180-300 秒
|
||||
- 图生文:建议 60-120 秒
|
||||
|
||||
### 4.2 重试策略
|
||||
|
||||
建议实现指数退避重试:
|
||||
1. 首次重试:等待 1 秒
|
||||
2. 第二次重试:等待 2 秒
|
||||
3. 第三次重试:等待 4 秒
|
||||
|
||||
### 4.3 图片格式
|
||||
|
||||
- 支持格式:PNG、JPG、JPEG、WebP、GIF
|
||||
- 推荐格式:PNG(无损)或 JPEG(有损但体积小)
|
||||
- 最大文件大小:建议不超过 4MB
|
||||
|
||||
### 4.4 Base64 编码
|
||||
|
||||
图片必须使用完整的 Data URL 格式:
|
||||
```
|
||||
...
|
||||
```
|
||||
215
.opencode/skills/image-service/docs/prompt-guide.md
Normal file
215
.opencode/skills/image-service/docs/prompt-guide.md
Normal file
@@ -0,0 +1,215 @@
|
||||
# 提示词指南
|
||||
|
||||
## 概述
|
||||
|
||||
本指南提供文生图、图生图和图生文三种场景的提示词编写规范和最佳实践。
|
||||
|
||||
---
|
||||
|
||||
## 一、文生图提示词
|
||||
|
||||
### 1.1 基本规则
|
||||
|
||||
1. **必须使用中文**撰写提示词
|
||||
2. 图片中的标题、说明、标签**必须为中文**
|
||||
3. 默认尺寸为 **16:9(1792x1024)**
|
||||
4. 结构化描述效果更好
|
||||
|
||||
### 1.2 标准模板
|
||||
|
||||
```
|
||||
[风格类型],[艺术效果],[分辨率]。
|
||||
标题:[中文标题]。
|
||||
视觉元素:[主体对象、结构、场景描述]。
|
||||
配色:[主色调方案]。
|
||||
类型:[具体类型]。
|
||||
```
|
||||
|
||||
### 1.3 推荐风格
|
||||
|
||||
| 风格 | 适用场景 |
|
||||
|-----|---------|
|
||||
| 信息图风格 | 数据展示、流程说明 |
|
||||
| 数据可视化 | 图表、统计数据 |
|
||||
| 手绘文字风格 | 笔记、教程 |
|
||||
| 科技插画风 | 技术文章配图 |
|
||||
| 扁平化设计 | UI/UX 展示 |
|
||||
| 3D 渲染风格 | 产品展示 |
|
||||
|
||||
### 1.4 示例
|
||||
|
||||
**信息图类**
|
||||
```
|
||||
信息图风格插图,手绘文字风格,高清16:9。
|
||||
标题:AI技术发展趋势。
|
||||
视觉元素:中央AI芯片图标,周围连接云计算、大数据、机器学习图标。
|
||||
配色:科技蓝和白色。
|
||||
类型:信息图。
|
||||
```
|
||||
|
||||
**数据可视化类**
|
||||
```
|
||||
数据可视化风格,中文标注,高清16:9。
|
||||
标题:2026年AI投资趋势。
|
||||
视觉元素:柱状图、增长箭头、美元符号。
|
||||
配色:金色和科技蓝。
|
||||
类型:数据可视化。
|
||||
```
|
||||
|
||||
**产品展示类**
|
||||
```
|
||||
3D产品渲染风格,光影效果,高清16:9。
|
||||
标题:智能手表新品发布。
|
||||
视觉元素:手表主体居中,周围展示核心功能图标。
|
||||
配色:深空灰和玫瑰金。
|
||||
类型:产品展示。
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 二、图生图提示词
|
||||
|
||||
### 2.1 基本规则
|
||||
|
||||
1. 明确指出**保留什么**和**修改什么**
|
||||
2. 描述**目标风格**和**期望效果**
|
||||
3. 提供具体的**细节要求**
|
||||
|
||||
### 2.2 标准模板
|
||||
|
||||
```
|
||||
基于原图进行编辑,[编辑描述]。
|
||||
保持:[需要保留的元素]。
|
||||
修改:[需要修改的部分]。
|
||||
风格:[目标风格]。
|
||||
细节:[具体的细节要求]。
|
||||
```
|
||||
|
||||
### 2.3 编辑类型
|
||||
|
||||
| 类型 | 说明 | 示例 |
|
||||
|-----|------|-----|
|
||||
| 风格迁移 | 改变整体风格 | 转为油画风格 |
|
||||
| 背景替换 | 更换背景 | 将背景改为海滩 |
|
||||
| 元素添加 | 添加新元素 | 添加文字标题 |
|
||||
| 元素删除 | 移除元素 | 删除背景人物 |
|
||||
| 色调调整 | 改变颜色 | 转为暖色调 |
|
||||
| 质量增强 | 提升质量 | 增加细节和清晰度 |
|
||||
|
||||
### 2.4 示例
|
||||
|
||||
**风格迁移**
|
||||
```
|
||||
基于原图进行编辑,将整体风格改为科技蓝色调的信息图。
|
||||
保持:主体元素和构图。
|
||||
修改:所有文字替换为中文标注,背景改为深蓝渐变。
|
||||
风格:现代科技感信息图。
|
||||
细节:添加数据流动效果和光点装饰。
|
||||
```
|
||||
|
||||
**人物编辑**
|
||||
```
|
||||
基于原图进行编辑,将人物转换为3D科幻风格。
|
||||
保持:人物姿态和面部特征。
|
||||
修改:服装改为未来感战斗服,增加全息UI界面。
|
||||
风格:类似钢铁侠贾维斯系统。
|
||||
细节:添加蓝色全息光效和数据面板。
|
||||
```
|
||||
|
||||
**背景替换**
|
||||
```
|
||||
基于原图进行编辑,替换背景为深色科技空间。
|
||||
保持:原图主体比例和清晰度。
|
||||
修改:背景完全替换,添加中文标题与数据标签。
|
||||
风格:深色科技风格。
|
||||
细节:背景添加星空和网格线条。
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 三、图生文提示词
|
||||
|
||||
### 3.1 分析模式
|
||||
|
||||
| 模式 | 用途 | 提示词 |
|
||||
|-----|------|-------|
|
||||
| describe | 通用描述 | 详细描述图片内容 |
|
||||
| ocr | 文字识别 | 识别图片中的所有文字 |
|
||||
| chart | 图表分析 | 分析图表数据和趋势 |
|
||||
| fashion | 穿搭分析 | 分析人物服装搭配 |
|
||||
| product | 产品分析 | 分析产品特征 |
|
||||
| scene | 场景分析 | 描述场景环境 |
|
||||
|
||||
### 3.2 自定义提示词示例
|
||||
|
||||
**详细描述**
|
||||
```
|
||||
请详细描述这张图片的内容,包括:
|
||||
1. 人物特征和表情
|
||||
2. 服装样式和颜色
|
||||
3. 画面布局和构图
|
||||
4. 艺术风格或摄影风格
|
||||
5. 任何文字标注或说明
|
||||
6. 背景环境和其他细节
|
||||
```
|
||||
|
||||
**OCR识别**
|
||||
```
|
||||
请仔细识别这张图片中的所有文字内容,包括:
|
||||
1. 标题和副标题
|
||||
2. 正文内容
|
||||
3. 图表标签
|
||||
4. 按钮文字
|
||||
5. 其他任何可见的文字
|
||||
|
||||
请按照文字在图片中的位置顺序,以清晰的格式输出识别结果。
|
||||
```
|
||||
|
||||
**图表分析**
|
||||
```
|
||||
请分析这张图表的内容,包括:
|
||||
1. 图表类型(柱状图、折线图、饼图等)
|
||||
2. 主要数据趋势
|
||||
3. 关键数据点
|
||||
4. 图表标题和标签
|
||||
5. 数据的结论或洞察
|
||||
|
||||
请用中文详细描述图表传达的信息。
|
||||
```
|
||||
|
||||
**穿搭分析**
|
||||
```
|
||||
请分析这张图片中人物的穿搭,包括:
|
||||
1. 上装:款式、颜色、材质
|
||||
2. 下装:款式、颜色、材质
|
||||
3. 鞋履:类型、颜色
|
||||
4. 配饰:包包、帽子、眼镜、饰品等
|
||||
5. 整体风格:休闲/商务/运动/时尚等
|
||||
6. 搭配建议和点评
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 四、最佳实践
|
||||
|
||||
### 4.1 提示词优化技巧
|
||||
|
||||
1. **具体明确**:避免模糊描述,使用具体词汇
|
||||
2. **结构清晰**:使用分点或模板结构
|
||||
3. **重点突出**:将最重要的要求放在前面
|
||||
4. **适度详细**:提供足够细节但不要过于冗长
|
||||
|
||||
### 4.2 常见问题
|
||||
|
||||
| 问题 | 原因 | 解决方案 |
|
||||
|-----|------|---------|
|
||||
| 生成结果与描述不符 | 提示词不够具体 | 添加更多细节描述 |
|
||||
| 中文显示异常 | 未强调中文要求 | 明确指定"中文标注" |
|
||||
| 风格不统一 | 风格描述模糊 | 使用具体的风格参考 |
|
||||
| 元素缺失 | 未明确列出元素 | 逐一列出所需元素 |
|
||||
|
||||
### 4.3 提示词长度建议
|
||||
|
||||
- 文生图:100-300 字
|
||||
- 图生图:50-200 字
|
||||
- 图生文:50-150 字
|
||||
@@ -0,0 +1,76 @@
|
||||
# 配色协同机制
|
||||
|
||||
当 image-service 与其他 skill 配合使用时(如 pptx、docx、obsidian 等),**必须感知上下文配色方案并自动适配**,确保生成的图片与目标载体风格统一。
|
||||
|
||||
## 协同原则
|
||||
|
||||
1. **主动感知**:生成配图前,先确认目标载体的配色方案
|
||||
2. **自动适配**:将配色信息融入图片生成提示词
|
||||
3. **风格统一**:背景色、主色调、强调色保持一致
|
||||
|
||||
## 配色来源优先级
|
||||
|
||||
| 优先级 | 来源 | 说明 |
|
||||
|-------|------|------|
|
||||
| 1 | 用户明确指定 | 用户直接提供的颜色值 |
|
||||
| 2 | 当前任务上下文 | 正在制作的 PPT/文档的配色方案 |
|
||||
| 3 | 项目配置文件 | `.design/palette.json` 或类似配置 |
|
||||
| 4 | 默认风格 | 手绘白底风格(无特殊要求时) |
|
||||
|
||||
## 与 PPTX 协同
|
||||
|
||||
制作 PPT 配图时,从 pptx skill 的设计方案中提取配色:
|
||||
|
||||
```markdown
|
||||
# 示例:PPT 配色方案
|
||||
- 背景色:#181B24(深蓝黑)
|
||||
- 主色:#B165FB(紫色)
|
||||
- 辅助色:#40695B(翡翠绿)
|
||||
- 文字色:#FFFFFF / #AAAAAA
|
||||
```
|
||||
|
||||
生成图片时,将配色融入提示词:
|
||||
|
||||
```bash
|
||||
# 错误示例(不考虑配色)
|
||||
python scripts/text_to_image.py "流程图,用户路径变化" -r 16:9
|
||||
|
||||
# 正确示例(融入配色)
|
||||
python scripts/text_to_image.py "信息图风格,深色背景#181B24,科技感流程图。用紫色#B165FB和翡翠绿#40695B作为强调色,展示用户路径变化,发光线条风格,中文标签" -r 16:9
|
||||
```
|
||||
|
||||
## 与其他 Skill 协同
|
||||
|
||||
| 目标载体 | 配色来源 | 适配要点 |
|
||||
|---------|---------|---------|
|
||||
| **PPTX** | HTML slides 的 CSS 配色 | 背景色、强调色、文字色统一 |
|
||||
| **DOCX** | 文档主题色或用户指定 | 配合文档正式/活泼风格 |
|
||||
| **Obsidian** | Vault 主题(深色/浅色) | 适配笔记阅读体验 |
|
||||
| **小红书** | 品牌色或内容调性 | 竖版 3:4,吸睛配色 |
|
||||
| **调研报告** | 统一手绘风格 | 使用 research_image.py 预设 |
|
||||
|
||||
## 配色提示词模板
|
||||
|
||||
```
|
||||
信息图风格,{背景描述}背景{背景色},{风格描述}。
|
||||
使用{主色}作为主色调,{辅助色}作为辅助色。
|
||||
{内容描述},{视觉风格},中文标签。
|
||||
```
|
||||
|
||||
**示例**:
|
||||
```
|
||||
信息图风格,深色背景#181B24,科技感对比图。
|
||||
使用紫色#B165FB作为主色调,翡翠绿#40695B作为辅助色。
|
||||
左侧展示SEO特点,右侧展示GEO特点,发光连接线风格,中文标签。
|
||||
```
|
||||
|
||||
## Agent 执行规范
|
||||
|
||||
1. **识别协同场景**:检测是否在为其他 skill 生成配图
|
||||
2. **提取配色方案**:从上下文/HTML/配置中获取颜色值
|
||||
3. **构建适配提示词**:将配色信息自然融入生成描述
|
||||
4. **验证风格一致**:生成后确认与目标载体视觉协调
|
||||
|
||||
## 协同执行流程
|
||||
|
||||
1. 确认目标载体 → 2. 提取配色方案 → 3. 融入提示词 → 4. 生成适配图片
|
||||
135
.opencode/skills/image-service/references/long-image-guide.md
Normal file
135
.opencode/skills/image-service/references/long-image-guide.md
Normal file
@@ -0,0 +1,135 @@
|
||||
# 长图生成规范
|
||||
|
||||
生成需要拼接的长图时,采用**叠罗汉式串行生成**,每张图参考上一张图生成,确保风格一致、衔接自然。
|
||||
|
||||
## 铁律:执行前必须分析+确认
|
||||
|
||||
**收到长图需求后,禁止直接开始生成!必须先完成以下步骤:**
|
||||
|
||||
### 第一步:分析提示词结构
|
||||
|
||||
仔细阅读提示词,识别以下信息:
|
||||
1. **分屏数量**:提示词中有几个明确的段落/模块?
|
||||
2. **每屏内容**:每一屏具体要展示什么?
|
||||
3. **全局风格**:色调、风格、光影等统一要素
|
||||
4. **衔接元素**:段落之间用什么元素过渡?
|
||||
|
||||
### 第二步:输出分屏规划表
|
||||
|
||||
必须用表格形式输出规划,让用户一目了然:
|
||||
|
||||
```markdown
|
||||
| 屏数 | 内容概要 | 关键元素 |
|
||||
|-----|---------|---------|
|
||||
| 1 | 主视觉+标题 | xxx |
|
||||
| 2 | xxx特写 | xxx |
|
||||
| ... | ... | ... |
|
||||
|
||||
**全局风格**:xxx风格、xxx色调、xxx布光
|
||||
**输出比例**:3:4
|
||||
**预计生成**:N张图 → 拼接为长图
|
||||
```
|
||||
|
||||
### 第三步:等待用户确认
|
||||
|
||||
**必须等用户说"OK"、"开始"、"没问题"后才能开始生成!**
|
||||
|
||||
用户可能会:
|
||||
- 调整分屏数量
|
||||
- 修改某屏内容
|
||||
- 补充遗漏的要素
|
||||
|
||||
## 核心原则:叠罗汉式串行生成
|
||||
|
||||
**为什么用串行而不是并发?**
|
||||
- 每张图的顶部颜色需要与上一张图的底部颜色衔接
|
||||
- 只有等上一张图生成完成,才能提取其底部色调
|
||||
- 串行生成确保每一屏之间的过渡自然无缝
|
||||
|
||||
**为什么参考上一张而不是首图?**
|
||||
- 参考首图会导致中间屏幕风格跳跃
|
||||
- 叠罗汉式参考让风格逐屏延续,过渡更平滑
|
||||
- 每张图只需关心与相邻图的衔接
|
||||
|
||||
## 生成前校验清单
|
||||
|
||||
| 检查项 | 要求 | 示例 |
|
||||
|-------|------|------|
|
||||
| **比例统一** | 所有分图使用相同 `-r` 参数 | 全部 `-r 3:4` |
|
||||
| **风格描述统一** | 使用相同的风格关键词 | 全部 `电影级美食摄影风格` |
|
||||
| **色调统一** | 定义主色调范围 | 全部 `深红色、暖棕色、金色` |
|
||||
|
||||
## Agent 执行流程(铁律)
|
||||
|
||||
```
|
||||
1. 收到长图需求
|
||||
2. 【分析】仔细阅读提示词,识别分屏结构
|
||||
3. 【规划】输出分屏规划表(表格形式)
|
||||
4. 【确认】等待用户确认后才开始生成(铁律!)
|
||||
5. 定义全局风格变量(主色调、风格词)
|
||||
6. 串行生成每一屏:
|
||||
a. 首屏:用 text_to_image.py 生成,定调
|
||||
b. 第2屏:用 image_to_image.py 参考第1屏生成
|
||||
c. 第3屏:用 image_to_image.py 参考第2屏生成
|
||||
d. 以此类推...每屏参考上一屏
|
||||
7. 每屏生成后等待完成,再生成下一屏(串行,不可并发)
|
||||
8. 全部完成后,使用 --blend 20 拼接输出
|
||||
```
|
||||
|
||||
## 图生图 Prompt 规范
|
||||
|
||||
**核心要点:顶部衔接上一张底部**
|
||||
|
||||
后续图片的 prompt 必须包含:
|
||||
1. **顶部衔接声明**:明确顶部颜色/氛围与上一张底部衔接
|
||||
2. **风格继承**:参考上一张图的整体风格、光影
|
||||
3. **本屏内容**:描述当前屏幕要展示的内容
|
||||
|
||||
**Prompt 模板:**
|
||||
```
|
||||
参考模板图的整体风格、色调和光影氛围。本屏顶部与上一屏底部自然衔接。{本屏具体内容描述}
|
||||
```
|
||||
|
||||
**更精确的写法(推荐):**
|
||||
```
|
||||
参考模板图的{风格}、{色调}、{光影}。顶部延续上一屏底部的{颜色/氛围}。{本屏具体内容描述}
|
||||
```
|
||||
|
||||
## 分屏位置规范
|
||||
|
||||
| 位置 | 处理方式 |
|
||||
|------|---------|
|
||||
| **首屏** | 顶部正常开始,底部内容自然过渡(无需刻意留白) |
|
||||
| **中间屏** | 顶部衔接上一屏底部颜色,底部内容自然过渡 |
|
||||
| **尾屏** | 顶部衔接上一屏底部颜色,底部正常收尾 |
|
||||
|
||||
**关键:不要预留固定百分比的留白区域,让内容自然过渡即可**
|
||||
|
||||
## 执行示例
|
||||
|
||||
```bash
|
||||
# 步骤1:生成首屏(文生图,定调)
|
||||
python .opencode/skills/image-service/scripts/text_to_image.py "高端美食摄影风格,深红暖棕金色调,电影级布光..." -r 3:4 -o 01_hero.png
|
||||
# 等待完成
|
||||
|
||||
# 步骤2:生成第2屏(参考第1屏)
|
||||
python .opencode/skills/image-service/scripts/image_to_image.py 01_hero.png "参考模板图的美食摄影风格、深红暖棕色调、电影级布光。顶部延续上一屏底部的暖色氛围。本屏内容:酥皮特写..." -r 3:4 -o 02_crisp.png
|
||||
# 等待完成
|
||||
|
||||
# 步骤3:生成第3屏(参考第2屏)
|
||||
python .opencode/skills/image-service/scripts/image_to_image.py 02_crisp.png "参考模板图的美食摄影风格、深红暖棕色调、电影级布光。顶部延续上一屏底部的色调。本屏内容:牛排特写..." -r 3:4 -o 03_tenderloin.png
|
||||
# 等待完成
|
||||
|
||||
# ...以此类推
|
||||
|
||||
# 最后:拼接(推荐 blend 20)
|
||||
python .opencode/skills/image-service/scripts/merge_long_image.py 01_hero.png 02_crisp.png 03_tenderloin.png ... -o final.png --blend 20
|
||||
```
|
||||
|
||||
## 铁律
|
||||
|
||||
1. **必须串行生成**:每屏生成完成后再生成下一屏,禁止并发
|
||||
2. **叠罗汉式参考**:第N屏参考第N-1屏,不是全部参考首屏
|
||||
3. **顶部衔接**:每屏的顶部颜色/氛围必须与上一屏底部衔接
|
||||
4. **不留固定留白**:不要预留4%/8%等固定留白,让内容自然过渡
|
||||
5. **脚本区分**:首屏用 `text_to_image.py`,后续全部用 `image_to_image.py`
|
||||
@@ -0,0 +1,41 @@
|
||||
# 文字清晰规范
|
||||
|
||||
生成包含中文文字的图片时,**必须在 prompt 末尾追加文字清晰指令**,确保文字可读、无乱码。
|
||||
|
||||
## 文字清晰后缀(必加)
|
||||
|
||||
```
|
||||
【文字渲染要求】
|
||||
- 所有中文文字必须清晰可读,笔画完整,无模糊、无乱码、无伪文字
|
||||
- 文字边缘锐利,呈现印刷级清晰度,彻底消除压缩噪点与边缘溢色
|
||||
- 字体风格统一,字距适中,排版规整
|
||||
- 严禁出现无法阅读的乱码字符或残缺笔画
|
||||
```
|
||||
|
||||
## 完整 Prompt 结构
|
||||
|
||||
```
|
||||
{风格描述}。{内容描述}。{布局描述}。
|
||||
|
||||
【文字渲染要求】
|
||||
- 所有中文文字必须清晰可读,笔画完整,无模糊、无乱码、无伪文字
|
||||
- 文字边缘锐利,呈现印刷级清晰度
|
||||
- 字体风格统一,排版规整
|
||||
```
|
||||
|
||||
## 生成后校验流程
|
||||
|
||||
1. 生成图片后,用 `image_to_text.py -m ocr` 校验文字是否清晰
|
||||
2. 如果 OCR 识别结果与预期文字不符,使用图生图迭代修复
|
||||
3. 修复 prompt 使用以下模板
|
||||
|
||||
## 文字修复 Prompt(图生图迭代修复用)
|
||||
|
||||
```
|
||||
执行语意级图像重构。针对图中模糊或乱码的文字区域进行修复:
|
||||
1. 保持原图的版面配置、物体座标、配色风格完全不变
|
||||
2. 将模糊文字修复为清晰的简体中文:{预期文字内容}
|
||||
3. 文字笔画必须呈现印刷级清晰度,边缘锐利,无压缩噪点
|
||||
4. 严禁产生无法阅读的伪文字或乱码
|
||||
直接输出修复后的图像。
|
||||
```
|
||||
273
.opencode/skills/image-service/scripts/image_to_image.py
Normal file
273
.opencode/skills/image-service/scripts/image_to_image.py
Normal file
@@ -0,0 +1,273 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
图生图脚本 (Image-to-Image)
|
||||
使用 Lyra Flash API 基于参考图片和中文指令进行图片编辑
|
||||
|
||||
Author: 翟星人
|
||||
"""
|
||||
|
||||
import httpx
|
||||
import base64
|
||||
import json
|
||||
import os
|
||||
from typing import Dict, Any, Optional, Union
|
||||
from pathlib import Path
|
||||
|
||||
VALID_ASPECT_RATIOS = [
|
||||
"1:1", "2:3", "3:2", "3:4", "4:3", "4:5", "5:4", "9:16", "16:9", "21:9"
|
||||
]
|
||||
|
||||
VALID_SIZES = [
|
||||
"1024x1024",
|
||||
"1536x1024", "1792x1024", "1344x768", "1248x832", "1184x864", "1152x896", "1536x672",
|
||||
"1024x1536", "1024x1792", "768x1344", "832x1248", "864x1184", "896x1152"
|
||||
]
|
||||
|
||||
RATIO_TO_SIZE = {
|
||||
"1:1": "1024x1024",
|
||||
"2:3": "832x1248",
|
||||
"3:2": "1248x832",
|
||||
"3:4": "1024x1536",
|
||||
"4:3": "1536x1024",
|
||||
"4:5": "864x1184",
|
||||
"5:4": "1184x864",
|
||||
"9:16": "1024x1792",
|
||||
"16:9": "1792x1024",
|
||||
"21:9": "1536x672"
|
||||
}
|
||||
|
||||
|
||||
class ImageToImageEditor:
|
||||
"""图生图编辑器"""
|
||||
|
||||
def __init__(self, config: Optional[Dict[str, str]] = None):
|
||||
"""
|
||||
初始化编辑器
|
||||
|
||||
Args:
|
||||
config: 配置字典,包含 api_key, base_url, model
|
||||
如果不传则从环境变量或配置文件读取
|
||||
"""
|
||||
if config is None:
|
||||
config = self._load_config()
|
||||
|
||||
self.api_key = config.get('api_key') or config.get('IMAGE_API_KEY')
|
||||
self.base_url = config.get('base_url') or config.get('IMAGE_API_BASE_URL')
|
||||
self.model = config.get('model') or config.get('IMAGE_MODEL') or 'lyra-flash-9'
|
||||
|
||||
if not self.api_key or not self.base_url:
|
||||
raise ValueError("缺少必要的 API 配置:api_key 和 base_url")
|
||||
|
||||
def _load_config(self) -> Dict[str, str]:
|
||||
"""从配置文件或环境变量加载配置"""
|
||||
config = {}
|
||||
|
||||
# 尝试从配置文件加载
|
||||
config_path = Path(__file__).parent.parent / 'config' / 'settings.json'
|
||||
if config_path.exists():
|
||||
with open(config_path, 'r', encoding='utf-8') as f:
|
||||
settings = json.load(f)
|
||||
api_config = settings.get('image_api', {})
|
||||
config['api_key'] = api_config.get('key')
|
||||
config['base_url'] = api_config.get('base_url')
|
||||
config['model'] = api_config.get('model')
|
||||
|
||||
# 环境变量优先级更高
|
||||
config['api_key'] = os.getenv('IMAGE_API_KEY', config.get('api_key'))
|
||||
config['base_url'] = os.getenv('IMAGE_API_BASE_URL', config.get('base_url'))
|
||||
config['model'] = os.getenv('IMAGE_MODEL', config.get('model'))
|
||||
|
||||
return config
|
||||
|
||||
@staticmethod
|
||||
def image_to_base64(image_path: str, with_prefix: bool = True) -> str:
|
||||
"""
|
||||
将图片文件转换为 base64 编码
|
||||
|
||||
Args:
|
||||
image_path: 图片文件路径
|
||||
with_prefix: 是否添加 data URL 前缀
|
||||
|
||||
Returns:
|
||||
base64 编码字符串
|
||||
"""
|
||||
path = Path(image_path)
|
||||
if not path.exists():
|
||||
raise FileNotFoundError(f"图片文件不存在: {image_path}")
|
||||
|
||||
# 获取 MIME 类型
|
||||
suffix = path.suffix.lower()
|
||||
mime_types = {
|
||||
'.jpg': 'image/jpeg',
|
||||
'.jpeg': 'image/jpeg',
|
||||
'.png': 'image/png',
|
||||
'.gif': 'image/gif',
|
||||
'.webp': 'image/webp'
|
||||
}
|
||||
mime_type = mime_types.get(suffix, 'image/png')
|
||||
|
||||
with open(image_path, 'rb') as f:
|
||||
b64_str = base64.b64encode(f.read()).decode('utf-8')
|
||||
|
||||
if with_prefix:
|
||||
return f"data:{mime_type};base64,{b64_str}"
|
||||
return b64_str
|
||||
|
||||
def edit(
|
||||
self,
|
||||
image: Union[str, bytes],
|
||||
prompt: str,
|
||||
aspect_ratio: Optional[str] = None,
|
||||
size: Optional[str] = None,
|
||||
output_path: Optional[str] = None,
|
||||
response_format: str = "b64_json"
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
编辑图片
|
||||
|
||||
Args:
|
||||
image: 图片路径或 base64 字符串
|
||||
prompt: 中文编辑指令
|
||||
aspect_ratio: 宽高比 (如 3:4, 16:9)
|
||||
size: 传统尺寸 (如 1024x1792)
|
||||
output_path: 输出文件路径
|
||||
response_format: 响应格式
|
||||
|
||||
Returns:
|
||||
包含编辑结果的字典
|
||||
"""
|
||||
# 处理图片输入
|
||||
if isinstance(image, str):
|
||||
if os.path.isfile(image):
|
||||
image_b64 = self.image_to_base64(image)
|
||||
elif image.startswith('data:'):
|
||||
image_b64 = image
|
||||
else:
|
||||
# 假设是纯 base64 字符串
|
||||
image_b64 = f"data:image/png;base64,{image}"
|
||||
else:
|
||||
image_b64 = f"data:image/png;base64,{base64.b64encode(image).decode('utf-8')}"
|
||||
|
||||
payload: Dict[str, Any] = {
|
||||
"model": self.model,
|
||||
"prompt": prompt,
|
||||
"image": image_b64,
|
||||
"response_format": response_format
|
||||
}
|
||||
|
||||
# 确定尺寸:优先用 aspect_ratio 映射,其次用 size
|
||||
if aspect_ratio:
|
||||
payload["size"] = RATIO_TO_SIZE.get(aspect_ratio, "1024x1536")
|
||||
elif size:
|
||||
payload["size"] = size
|
||||
else:
|
||||
payload["size"] = "1024x1536" # 默认 3:4
|
||||
|
||||
headers = {
|
||||
"Content-Type": "application/json",
|
||||
"Authorization": f"Bearer {self.api_key}"
|
||||
}
|
||||
|
||||
try:
|
||||
with httpx.Client(timeout=180.0) as client:
|
||||
response = client.post(
|
||||
f"{self.base_url}/images/edits",
|
||||
headers=headers,
|
||||
json=payload
|
||||
)
|
||||
response.raise_for_status()
|
||||
result = response.json()
|
||||
|
||||
# 如果指定了输出路径,保存图片
|
||||
if output_path and result.get("data"):
|
||||
b64_data = result["data"][0].get("b64_json")
|
||||
if b64_data:
|
||||
self._save_image(b64_data, output_path)
|
||||
result["saved_path"] = output_path
|
||||
|
||||
return {
|
||||
"success": True,
|
||||
"data": result,
|
||||
"saved_path": output_path if output_path else None
|
||||
}
|
||||
|
||||
except httpx.HTTPStatusError as e:
|
||||
return {
|
||||
"success": False,
|
||||
"error": f"HTTP 错误: {e.response.status_code}",
|
||||
"detail": str(e)
|
||||
}
|
||||
except Exception as e:
|
||||
return {
|
||||
"success": False,
|
||||
"error": "编辑失败",
|
||||
"detail": str(e)
|
||||
}
|
||||
|
||||
def _save_image(self, b64_data: str, output_path: str) -> None:
|
||||
"""保存 base64 图片到文件"""
|
||||
image_data = base64.b64decode(b64_data)
|
||||
Path(output_path).parent.mkdir(parents=True, exist_ok=True)
|
||||
with open(output_path, 'wb') as f:
|
||||
f.write(image_data)
|
||||
|
||||
|
||||
def main():
|
||||
"""命令行入口"""
|
||||
import argparse
|
||||
import time
|
||||
|
||||
parser = argparse.ArgumentParser(
|
||||
description='图生图编辑工具',
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog=f'''
|
||||
尺寸参数说明:
|
||||
-r/--ratio 宽高比(推荐),支持: {", ".join(VALID_ASPECT_RATIOS)}
|
||||
-s/--size 传统尺寸,支持: {", ".join(VALID_SIZES[:4])}...
|
||||
|
||||
示例:
|
||||
python image_to_image.py input.png "编辑描述" -r 3:4
|
||||
python image_to_image.py input.png "编辑描述" -s 1024x1536
|
||||
'''
|
||||
)
|
||||
parser.add_argument('image', help='输入图片路径')
|
||||
parser.add_argument('prompt', help='中文编辑指令')
|
||||
parser.add_argument('-o', '--output', help='输出文件路径(默认保存到当前目录)')
|
||||
parser.add_argument('-r', '--ratio', help=f'宽高比(推荐)。可选: {", ".join(VALID_ASPECT_RATIOS)}')
|
||||
parser.add_argument('-s', '--size', help='传统尺寸,如 1024x1536')
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if args.ratio and args.ratio not in VALID_ASPECT_RATIOS:
|
||||
print(f"错误: 不支持的宽高比 '{args.ratio}'")
|
||||
print(f"支持的宽高比: {', '.join(VALID_ASPECT_RATIOS)}")
|
||||
return
|
||||
|
||||
if args.size and args.size not in VALID_SIZES:
|
||||
print(f"警告: 尺寸 '{args.size}' 可能不被支持")
|
||||
|
||||
output_path = args.output
|
||||
if not output_path:
|
||||
timestamp = time.strftime("%Y%m%d_%H%M%S")
|
||||
output_path = f"edited_{timestamp}.png"
|
||||
|
||||
editor = ImageToImageEditor()
|
||||
result = editor.edit(
|
||||
image=args.image,
|
||||
prompt=args.prompt,
|
||||
aspect_ratio=args.ratio,
|
||||
size=args.size,
|
||||
output_path=output_path
|
||||
)
|
||||
|
||||
if result["success"]:
|
||||
print(f"编辑成功!")
|
||||
if result.get("saved_path"):
|
||||
print(f"图片已保存到: {result['saved_path']}")
|
||||
else:
|
||||
print(f"编辑失败: {result['error']}")
|
||||
print(f"详情: {result.get('detail', 'N/A')}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
287
.opencode/skills/image-service/scripts/image_to_text.py
Normal file
287
.opencode/skills/image-service/scripts/image_to_text.py
Normal file
@@ -0,0 +1,287 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
图生文脚本 (Image-to-Text) - 视觉识别
|
||||
使用 Qwen2.5-VL 模型分析图片内容并生成文字描述
|
||||
|
||||
Author: 翟星人
|
||||
"""
|
||||
|
||||
import httpx
|
||||
import base64
|
||||
import json
|
||||
import os
|
||||
from typing import Dict, Any, Optional, Union, List
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
class ImageToTextAnalyzer:
|
||||
"""图生文分析器 - 视觉识别"""
|
||||
|
||||
# 预定义的分析模式
|
||||
ANALYSIS_MODES = {
|
||||
"describe": "请详细描述这张图片的内容,包括:人物、场景、物品、颜色、布局等所有细节。",
|
||||
"ocr": "请仔细识别这张图片中的所有文字内容,按照文字在图片中的位置顺序输出。如果是中文,请保持原文输出。",
|
||||
"chart": "请分析这张图表的内容,包括:图表类型、数据趋势、关键数据点、标题标签、以及数据的结论或洞察。",
|
||||
"fashion": "请分析这张图片中人物的穿搭,包括:服装款式、颜色搭配、配饰、整体风格等。",
|
||||
"product": "请分析这张产品图片,包括:产品类型、外观特征、功能特点、品牌信息等。",
|
||||
"scene": "请描述这张图片的场景,包括:地点、环境、氛围、时间(白天/夜晚)等。"
|
||||
}
|
||||
|
||||
def __init__(self, config: Optional[Dict[str, str]] = None):
|
||||
"""
|
||||
初始化分析器
|
||||
|
||||
Args:
|
||||
config: 配置字典,包含 api_key, base_url, model
|
||||
如果不传则从环境变量或配置文件读取
|
||||
"""
|
||||
if config is None:
|
||||
config = self._load_config()
|
||||
|
||||
self.api_key = config.get('api_key') or config.get('VISION_API_KEY') or config.get('IMAGE_API_KEY')
|
||||
self.base_url = config.get('base_url') or config.get('VISION_API_BASE_URL') or config.get('IMAGE_API_BASE_URL')
|
||||
self.model = config.get('model') or config.get('VISION_MODEL') or 'qwen2.5-vl-72b-instruct'
|
||||
|
||||
if not self.api_key or not self.base_url:
|
||||
raise ValueError("缺少必要的 API 配置:api_key 和 base_url")
|
||||
|
||||
def _load_config(self) -> Dict[str, str]:
|
||||
"""从配置文件或环境变量加载配置"""
|
||||
config = {}
|
||||
|
||||
# 尝试从配置文件加载
|
||||
config_path = Path(__file__).parent.parent / 'config' / 'settings.json'
|
||||
if config_path.exists():
|
||||
with open(config_path, 'r', encoding='utf-8') as f:
|
||||
settings = json.load(f)
|
||||
# 优先使用 vision_api 配置
|
||||
vision_config = settings.get('vision_api', {})
|
||||
if vision_config:
|
||||
config['api_key'] = vision_config.get('key')
|
||||
config['base_url'] = vision_config.get('base_url')
|
||||
config['model'] = vision_config.get('model')
|
||||
else:
|
||||
# 回退到 image_api 配置
|
||||
api_config = settings.get('image_api', {})
|
||||
config['api_key'] = api_config.get('key')
|
||||
config['base_url'] = api_config.get('base_url')
|
||||
|
||||
# 环境变量优先级更高
|
||||
config['api_key'] = os.getenv('VISION_API_KEY', os.getenv('IMAGE_API_KEY', config.get('api_key')))
|
||||
config['base_url'] = os.getenv('VISION_API_BASE_URL', os.getenv('IMAGE_API_BASE_URL', config.get('base_url')))
|
||||
config['model'] = os.getenv('VISION_MODEL', config.get('model', 'qwen2.5-vl-72b-instruct'))
|
||||
|
||||
return config
|
||||
|
||||
@staticmethod
|
||||
def image_to_base64(image_path: str) -> str:
|
||||
"""
|
||||
将图片文件转换为 base64 编码(带 data URL 前缀)
|
||||
|
||||
Args:
|
||||
image_path: 图片文件路径
|
||||
|
||||
Returns:
|
||||
base64 编码字符串(含 data URL 前缀)
|
||||
"""
|
||||
path = Path(image_path)
|
||||
if not path.exists():
|
||||
raise FileNotFoundError(f"图片文件不存在: {image_path}")
|
||||
|
||||
# 获取 MIME 类型
|
||||
suffix = path.suffix.lower()
|
||||
mime_types = {
|
||||
'.jpg': 'image/jpeg',
|
||||
'.jpeg': 'image/jpeg',
|
||||
'.png': 'image/png',
|
||||
'.gif': 'image/gif',
|
||||
'.webp': 'image/webp'
|
||||
}
|
||||
mime_type = mime_types.get(suffix, 'image/png')
|
||||
|
||||
with open(image_path, 'rb') as f:
|
||||
b64_str = base64.b64encode(f.read()).decode('utf-8')
|
||||
|
||||
return f"data:{mime_type};base64,{b64_str}"
|
||||
|
||||
def analyze(
|
||||
self,
|
||||
image: Union[str, bytes],
|
||||
prompt: Optional[str] = None,
|
||||
mode: str = "describe",
|
||||
max_tokens: int = 2000,
|
||||
temperature: float = 0.7
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
分析图片并生成文字描述
|
||||
|
||||
Args:
|
||||
image: 图片路径、URL 或 base64 字符串
|
||||
prompt: 自定义分析提示词(如果提供则忽略 mode)
|
||||
mode: 分析模式 (describe/ocr/chart/fashion/product/scene)
|
||||
max_tokens: 最大输出 token 数
|
||||
temperature: 温度参数
|
||||
|
||||
Returns:
|
||||
包含分析结果的字典
|
||||
"""
|
||||
# 确定使用的提示词
|
||||
if prompt is None:
|
||||
prompt = self.ANALYSIS_MODES.get(mode, self.ANALYSIS_MODES["describe"])
|
||||
|
||||
# 处理图片输入
|
||||
if isinstance(image, str):
|
||||
if os.path.isfile(image):
|
||||
image_url = self.image_to_base64(image)
|
||||
elif image.startswith('data:') or image.startswith('http'):
|
||||
image_url = image
|
||||
else:
|
||||
# 假设是纯 base64 字符串
|
||||
image_url = f"data:image/png;base64,{image}"
|
||||
else:
|
||||
image_url = f"data:image/png;base64,{base64.b64encode(image).decode('utf-8')}"
|
||||
|
||||
# 构建请求
|
||||
payload = {
|
||||
"model": self.model,
|
||||
"messages": [
|
||||
{
|
||||
"role": "user",
|
||||
"content": [
|
||||
{
|
||||
"type": "text",
|
||||
"text": prompt
|
||||
},
|
||||
{
|
||||
"type": "image_url",
|
||||
"image_url": {
|
||||
"url": image_url
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"max_tokens": max_tokens,
|
||||
"temperature": temperature
|
||||
}
|
||||
|
||||
headers = {
|
||||
"Content-Type": "application/json",
|
||||
"Authorization": f"Bearer {self.api_key}"
|
||||
}
|
||||
|
||||
try:
|
||||
with httpx.Client(timeout=120.0) as client:
|
||||
response = client.post(
|
||||
f"{self.base_url}/chat/completions",
|
||||
headers=headers,
|
||||
json=payload
|
||||
)
|
||||
response.raise_for_status()
|
||||
result = response.json()
|
||||
|
||||
# 提取文本内容
|
||||
content = result.get("choices", [{}])[0].get("message", {}).get("content", "")
|
||||
|
||||
return {
|
||||
"success": True,
|
||||
"content": content,
|
||||
"mode": mode,
|
||||
"usage": result.get("usage", {})
|
||||
}
|
||||
|
||||
except httpx.HTTPStatusError as e:
|
||||
return {
|
||||
"success": False,
|
||||
"error": f"HTTP 错误: {e.response.status_code}",
|
||||
"detail": str(e)
|
||||
}
|
||||
except Exception as e:
|
||||
return {
|
||||
"success": False,
|
||||
"error": "分析失败",
|
||||
"detail": str(e)
|
||||
}
|
||||
|
||||
def describe(self, image: Union[str, bytes]) -> Dict[str, Any]:
|
||||
"""通用图片描述"""
|
||||
return self.analyze(image, mode="describe")
|
||||
|
||||
def ocr(self, image: Union[str, bytes]) -> Dict[str, Any]:
|
||||
"""文字识别 (OCR)"""
|
||||
return self.analyze(image, mode="ocr")
|
||||
|
||||
def analyze_chart(self, image: Union[str, bytes]) -> Dict[str, Any]:
|
||||
"""图表分析"""
|
||||
return self.analyze(image, mode="chart")
|
||||
|
||||
def analyze_fashion(self, image: Union[str, bytes]) -> Dict[str, Any]:
|
||||
"""穿搭分析"""
|
||||
return self.analyze(image, mode="fashion")
|
||||
|
||||
def analyze_product(self, image: Union[str, bytes]) -> Dict[str, Any]:
|
||||
"""产品分析"""
|
||||
return self.analyze(image, mode="product")
|
||||
|
||||
def analyze_scene(self, image: Union[str, bytes]) -> Dict[str, Any]:
|
||||
"""场景分析"""
|
||||
return self.analyze(image, mode="scene")
|
||||
|
||||
def batch_analyze(
|
||||
self,
|
||||
images: List[str],
|
||||
mode: str = "describe"
|
||||
) -> List[Dict[str, Any]]:
|
||||
"""
|
||||
批量分析多张图片
|
||||
|
||||
Args:
|
||||
images: 图片路径列表
|
||||
mode: 分析模式
|
||||
|
||||
Returns:
|
||||
分析结果列表
|
||||
"""
|
||||
results = []
|
||||
for image in images:
|
||||
result = self.analyze(image, mode=mode)
|
||||
result["image"] = image
|
||||
results.append(result)
|
||||
return results
|
||||
|
||||
|
||||
def main():
|
||||
"""命令行入口"""
|
||||
import argparse
|
||||
|
||||
parser = argparse.ArgumentParser(description='图生文分析工具(视觉识别)')
|
||||
parser.add_argument('image', help='输入图片路径')
|
||||
parser.add_argument('-m', '--mode', default='describe',
|
||||
choices=['describe', 'ocr', 'chart', 'fashion', 'product', 'scene'],
|
||||
help='分析模式')
|
||||
parser.add_argument('-p', '--prompt', help='自定义分析提示词')
|
||||
parser.add_argument('--max-tokens', type=int, default=2000, help='最大输出 token 数')
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
analyzer = ImageToTextAnalyzer()
|
||||
result = analyzer.analyze(
|
||||
image=args.image,
|
||||
prompt=args.prompt,
|
||||
mode=args.mode,
|
||||
max_tokens=args.max_tokens
|
||||
)
|
||||
|
||||
if result["success"]:
|
||||
print(f"\n=== 分析结果 ({result['mode']}) ===\n")
|
||||
print(result["content"])
|
||||
print(f"\n=== Token 使用 ===")
|
||||
print(f"输入: {result['usage'].get('prompt_tokens', 'N/A')}")
|
||||
print(f"输出: {result['usage'].get('completion_tokens', 'N/A')}")
|
||||
else:
|
||||
print(f"分析失败: {result['error']}")
|
||||
print(f"详情: {result.get('detail', 'N/A')}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
251
.opencode/skills/image-service/scripts/merge_long_image.py
Normal file
251
.opencode/skills/image-service/scripts/merge_long_image.py
Normal file
@@ -0,0 +1,251 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
长图拼接脚本 (Merge Long Image)
|
||||
将多张图片按顺序垂直拼接成一张微信长图
|
||||
|
||||
Author: 翟星人
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import os
|
||||
import glob as glob_module
|
||||
from pathlib import Path
|
||||
from typing import List, Optional, Dict, Any
|
||||
|
||||
from PIL import Image
|
||||
import numpy as np
|
||||
|
||||
|
||||
class LongImageMerger:
|
||||
"""长图拼接器"""
|
||||
|
||||
def __init__(self, target_width: int = 1080):
|
||||
"""
|
||||
初始化拼接器
|
||||
|
||||
Args:
|
||||
target_width: 目标宽度,默认1080(微信推荐宽度)
|
||||
"""
|
||||
self.target_width = target_width
|
||||
|
||||
def _blend_images(self, img_top: Image.Image, img_bottom: Image.Image, blend_height: int) -> Image.Image:
|
||||
"""
|
||||
在两张图的接缝处创建渐变融合过渡
|
||||
|
||||
Args:
|
||||
img_top: 上方图片
|
||||
img_bottom: 下方图片
|
||||
blend_height: 融合区域高度(像素)
|
||||
|
||||
Returns:
|
||||
融合后的下方图片(顶部已与上方图片底部融合)
|
||||
"""
|
||||
blend_height = min(blend_height, img_top.height // 4, img_bottom.height // 4)
|
||||
|
||||
top_region = img_top.crop((0, img_top.height - blend_height, img_top.width, img_top.height))
|
||||
bottom_region = img_bottom.crop((0, 0, img_bottom.width, blend_height))
|
||||
|
||||
top_array = np.array(top_region, dtype=np.float32)
|
||||
bottom_array = np.array(bottom_region, dtype=np.float32)
|
||||
|
||||
alpha = np.linspace(1, 0, blend_height).reshape(-1, 1, 1)
|
||||
|
||||
blended_array = top_array * alpha + bottom_array * (1 - alpha)
|
||||
blended_array = np.clip(blended_array, 0, 255).astype(np.uint8)
|
||||
|
||||
blended_region = Image.fromarray(blended_array)
|
||||
|
||||
result = img_bottom.copy()
|
||||
result.paste(blended_region, (0, 0))
|
||||
|
||||
return result
|
||||
|
||||
def merge(
|
||||
self,
|
||||
image_paths: List[str],
|
||||
output_path: str,
|
||||
gap: int = 0,
|
||||
background_color: str = "white",
|
||||
blend: int = 0
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
拼接多张图片为长图
|
||||
|
||||
Args:
|
||||
image_paths: 图片路径列表,按顺序拼接
|
||||
output_path: 输出文件路径
|
||||
gap: 图片之间的间隔像素,默认0
|
||||
background_color: 背景颜色,默认白色
|
||||
blend: 接缝融合过渡区域高度(像素),默认0不融合,推荐30-50
|
||||
|
||||
Returns:
|
||||
包含拼接结果的字典
|
||||
"""
|
||||
if not image_paths:
|
||||
return {"success": False, "error": "没有提供图片路径"}
|
||||
|
||||
valid_paths = []
|
||||
for p in image_paths:
|
||||
if os.path.exists(p):
|
||||
valid_paths.append(p)
|
||||
else:
|
||||
print(f"警告: 文件不存在,跳过 - {p}")
|
||||
|
||||
if not valid_paths:
|
||||
return {"success": False, "error": "没有有效的图片文件"}
|
||||
|
||||
try:
|
||||
imgs = [Image.open(p) for p in valid_paths]
|
||||
|
||||
resized_imgs = []
|
||||
for img in imgs:
|
||||
if img.mode in ('RGBA', 'P'):
|
||||
img = img.convert('RGB')
|
||||
ratio = self.target_width / img.width
|
||||
new_height = int(img.height * ratio)
|
||||
resized = img.resize((self.target_width, new_height), Image.Resampling.LANCZOS)
|
||||
resized_imgs.append(resized)
|
||||
|
||||
if blend > 0 and len(resized_imgs) > 1:
|
||||
for i in range(1, len(resized_imgs)):
|
||||
resized_imgs[i] = self._blend_images(resized_imgs[i-1], resized_imgs[i], blend)
|
||||
|
||||
total_height = sum(img.height for img in resized_imgs) + gap * (len(resized_imgs) - 1)
|
||||
|
||||
long_image = Image.new('RGB', (self.target_width, total_height), background_color)
|
||||
|
||||
y_offset = 0
|
||||
for img in resized_imgs:
|
||||
long_image.paste(img, (0, y_offset))
|
||||
y_offset += img.height + gap
|
||||
|
||||
Path(output_path).parent.mkdir(parents=True, exist_ok=True)
|
||||
long_image.save(output_path, quality=95)
|
||||
|
||||
for img in imgs:
|
||||
img.close()
|
||||
for img in resized_imgs:
|
||||
img.close()
|
||||
|
||||
return {
|
||||
"success": True,
|
||||
"saved_path": output_path,
|
||||
"width": self.target_width,
|
||||
"height": total_height,
|
||||
"image_count": len(resized_imgs)
|
||||
}
|
||||
|
||||
except Exception as e:
|
||||
return {"success": False, "error": str(e)}
|
||||
|
||||
def merge_from_pattern(
|
||||
self,
|
||||
pattern: str,
|
||||
output_path: str,
|
||||
sort_by: str = "name",
|
||||
gap: int = 0,
|
||||
background_color: str = "white",
|
||||
blend: int = 0
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
通过 glob 模式匹配图片并拼接
|
||||
|
||||
Args:
|
||||
pattern: glob 模式,如 "*.png" 或 "generated_*.png"
|
||||
output_path: 输出文件路径
|
||||
sort_by: 排序方式 - "name"(文件名) / "time"(修改时间) / "none"(不排序)
|
||||
gap: 图片间隔
|
||||
background_color: 背景颜色
|
||||
blend: 接缝融合过渡高度
|
||||
|
||||
Returns:
|
||||
包含拼接结果的字典
|
||||
"""
|
||||
image_paths = glob_module.glob(pattern)
|
||||
|
||||
if not image_paths:
|
||||
return {"success": False, "error": f"没有找到匹配 '{pattern}' 的图片"}
|
||||
|
||||
if sort_by == "name":
|
||||
image_paths.sort()
|
||||
elif sort_by == "time":
|
||||
image_paths.sort(key=lambda x: os.path.getmtime(x))
|
||||
|
||||
print(f"找到 {len(image_paths)} 张图片:")
|
||||
for i, p in enumerate(image_paths, 1):
|
||||
print(f" {i}. {os.path.basename(p)}")
|
||||
|
||||
return self.merge(image_paths, output_path, gap, background_color, blend)
|
||||
|
||||
|
||||
def main():
|
||||
"""命令行入口"""
|
||||
parser = argparse.ArgumentParser(
|
||||
description='长图拼接工具 - 将多张图片垂直拼接成微信长图',
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
示例用法:
|
||||
# 拼接指定图片
|
||||
python merge_long_image.py img1.png img2.png img3.png -o output.png
|
||||
|
||||
# 使用通配符匹配
|
||||
python merge_long_image.py -p "generated_*.png" -o long_image.png
|
||||
|
||||
# 指定宽度和间隔
|
||||
python merge_long_image.py -p "*.png" -o out.png -w 750 -g 20
|
||||
|
||||
# 按修改时间排序
|
||||
python merge_long_image.py -p "*.png" -o out.png --sort time
|
||||
|
||||
# 启用接缝融合过渡(推荐40px)
|
||||
python merge_long_image.py img1.png img2.png -o out.png --blend 40
|
||||
"""
|
||||
)
|
||||
|
||||
parser.add_argument('images', nargs='*', help='要拼接的图片路径列表')
|
||||
parser.add_argument('-p', '--pattern', help='glob 模式匹配图片,如 "*.png"')
|
||||
parser.add_argument('-o', '--output', required=True, help='输出文件路径')
|
||||
parser.add_argument('-w', '--width', type=int, default=1080, help='目标宽度,默认1080')
|
||||
parser.add_argument('-g', '--gap', type=int, default=0, help='图片间隔像素,默认0')
|
||||
parser.add_argument('--sort', choices=['name', 'time', 'none'], default='name',
|
||||
help='排序方式:name(文件名)/time(修改时间)/none')
|
||||
parser.add_argument('--bg', default='white', help='背景颜色,默认 white')
|
||||
parser.add_argument('--blend', type=int, default=0,
|
||||
help='接缝融合过渡高度(像素),推荐30-50,默认0不融合')
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if not args.images and not args.pattern:
|
||||
parser.error("请提供图片路径列表或使用 -p 指定匹配模式")
|
||||
|
||||
merger = LongImageMerger(target_width=args.width)
|
||||
|
||||
if args.pattern:
|
||||
result = merger.merge_from_pattern(
|
||||
pattern=args.pattern,
|
||||
output_path=args.output,
|
||||
sort_by=args.sort,
|
||||
gap=args.gap,
|
||||
background_color=args.bg,
|
||||
blend=args.blend
|
||||
)
|
||||
else:
|
||||
result = merger.merge(
|
||||
image_paths=args.images,
|
||||
output_path=args.output,
|
||||
gap=args.gap,
|
||||
background_color=args.bg,
|
||||
blend=args.blend
|
||||
)
|
||||
|
||||
if result["success"]:
|
||||
print(f"\n拼接成功!")
|
||||
print(f"输出文件: {result['saved_path']}")
|
||||
print(f"尺寸: {result['width']} x {result['height']}")
|
||||
print(f"共 {result['image_count']} 张图片")
|
||||
else:
|
||||
print(f"\n拼接失败: {result['error']}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
140
.opencode/skills/image-service/scripts/research_image.py
Normal file
140
.opencode/skills/image-service/scripts/research_image.py
Normal file
@@ -0,0 +1,140 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
调研报告专用信息图生成脚本
|
||||
预设手绘风格可视化模板,保持系列配图风格统一
|
||||
|
||||
Author: 翟星人
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import subprocess
|
||||
import sys
|
||||
import os
|
||||
|
||||
# 预设风格模板 - 手绘体可视化风格
|
||||
STYLE_TEMPLATES = {
|
||||
"arch": {
|
||||
"name": "架构图",
|
||||
"prefix": "手绘风格技术架构信息图,简洁扁平设计,",
|
||||
"suffix": "手绘线条感,柔和的科技蓝配色(#4A90D9),浅灰白色背景,模块化分层布局,圆角矩形框,手写体中文标签,简约图标,整体清新专业。",
|
||||
"trigger": "核心架构、系统结构、技术栈、模块组成"
|
||||
},
|
||||
"flow": {
|
||||
"name": "流程图",
|
||||
"prefix": "手绘风格流程信息图,简洁扁平设计,",
|
||||
"suffix": "手绘线条和箭头,科技蓝(#4A90D9)主色调,浅绿色(#81C784)表示成功节点,浅橙色(#FFB74D)表示判断节点,浅灰白色背景,从上到下或从左到右布局,手写体中文标签,步骤清晰。",
|
||||
"trigger": "流程、步骤、工作流、执行顺序"
|
||||
},
|
||||
"compare": {
|
||||
"name": "对比图",
|
||||
"prefix": "手绘风格对比信息图,左右分栏设计,",
|
||||
"suffix": "手绘线条感,左侧用柔和蓝色(#4A90D9),右侧用柔和橙色(#FF8A65),中间VS分隔,浅灰白色背景,手写体中文标签,对比项目清晰列出,简约图标点缀。",
|
||||
"trigger": "对比、vs、区别、差异"
|
||||
},
|
||||
"concept": {
|
||||
"name": "概念图",
|
||||
"prefix": "手绘风格概念信息图,中心发散设计,",
|
||||
"suffix": "手绘线条感,中心主题用科技蓝(#4A90D9),周围要素用柔和的蓝紫渐变色系,浅灰白色背景,连接线条有手绘感,手写体中文标签,布局均衡美观。",
|
||||
"trigger": "核心概念、要素组成、多个方面"
|
||||
}
|
||||
}
|
||||
|
||||
# 基础路径
|
||||
BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
|
||||
TEXT_TO_IMAGE_SCRIPT = os.path.join(BASE_DIR, "scripts", "text_to_image.py")
|
||||
|
||||
|
||||
def generate_image(style: str, title: str, content: str, output: str):
|
||||
"""
|
||||
使用预设风格生成信息图
|
||||
|
||||
Args:
|
||||
style: 风格类型 (arch/flow/compare/concept)
|
||||
title: 图表标题
|
||||
content: 图表内容描述
|
||||
output: 输出路径
|
||||
"""
|
||||
if style not in STYLE_TEMPLATES:
|
||||
print(f"错误: 未知风格 '{style}'")
|
||||
print(f"可用风格: {', '.join(STYLE_TEMPLATES.keys())}")
|
||||
sys.exit(1)
|
||||
|
||||
template = STYLE_TEMPLATES[style]
|
||||
|
||||
# 组装完整提示词
|
||||
prompt = f"{template['prefix']}标题:{title},{content},{template['suffix']}"
|
||||
|
||||
print(f"生成 {template['name']}: {title}")
|
||||
print(f"风格: 手绘体可视化")
|
||||
print(f"输出: {output}")
|
||||
|
||||
# 调用 text_to_image.py
|
||||
cmd = [
|
||||
sys.executable,
|
||||
TEXT_TO_IMAGE_SCRIPT,
|
||||
prompt,
|
||||
"--output", output
|
||||
]
|
||||
|
||||
result = subprocess.run(cmd, capture_output=False)
|
||||
|
||||
if result.returncode != 0:
|
||||
print(f"生成失败")
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
def list_styles():
|
||||
"""列出所有可用风格"""
|
||||
print("可用风格模板(手绘体可视化):\n")
|
||||
for key, template in STYLE_TEMPLATES.items():
|
||||
print(f" {key:10} - {template['name']}")
|
||||
print(f" 触发场景: {template['trigger']}")
|
||||
print()
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="调研报告专用信息图生成(手绘风格)",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
示例:
|
||||
# 生成架构图
|
||||
python research_image.py -t arch -n "Ralph Loop 核心架构" -c "展示 Prompt、Agent、Stop Hook、Files 四个模块的循环关系" -o images/arch.png
|
||||
|
||||
# 生成流程图
|
||||
python research_image.py -t flow -n "Stop Hook 工作流程" -c "Agent尝试退出、Hook触发、检查条件、允许或阻止退出" -o images/flow.png
|
||||
|
||||
# 生成对比图
|
||||
python research_image.py -t compare -n "ReAct vs Ralph Loop" -c "左侧ReAct自我评估停止,右侧Ralph外部Hook控制" -o images/compare.png
|
||||
|
||||
# 生成概念图
|
||||
python research_image.py -t concept -n "状态持久化" -c "中心是Agent,周围是progress.txt、prd.json、Git历史、代码文件四个要素" -o images/concept.png
|
||||
|
||||
# 查看所有风格
|
||||
python research_image.py --list
|
||||
"""
|
||||
)
|
||||
|
||||
parser.add_argument("-t", "--type", choices=list(STYLE_TEMPLATES.keys()),
|
||||
help="图解类型: arch(架构图), flow(流程图), compare(对比图), concept(概念图)")
|
||||
parser.add_argument("-n", "--name", help="图表标题")
|
||||
parser.add_argument("-c", "--content", help="图表内容描述")
|
||||
parser.add_argument("-o", "--output", help="输出文件路径")
|
||||
parser.add_argument("--list", action="store_true", help="列出所有可用风格")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if args.list:
|
||||
list_styles()
|
||||
return
|
||||
|
||||
if not all([args.type, args.name, args.content, args.output]):
|
||||
parser.print_help()
|
||||
print("\n错误: 必须提供 -t, -n, -c, -o 参数")
|
||||
sys.exit(1)
|
||||
|
||||
generate_image(args.type, args.name, args.content, args.output)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
350
.opencode/skills/image-service/scripts/text_to_image.py
Normal file
350
.opencode/skills/image-service/scripts/text_to_image.py
Normal file
@@ -0,0 +1,350 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
文生图脚本 (Text-to-Image)
|
||||
使用 Lyra Flash API 根据中文文本描述生成图片
|
||||
支持参考图风格生成
|
||||
|
||||
Author: 翟星人
|
||||
"""
|
||||
|
||||
import httpx
|
||||
import base64
|
||||
import json
|
||||
import os
|
||||
from typing import Dict, Any, Optional, Union
|
||||
from pathlib import Path
|
||||
|
||||
VALID_ASPECT_RATIOS = [
|
||||
"1:1", "2:3", "3:2", "3:4", "4:3", "4:5", "5:4", "9:16", "16:9", "21:9"
|
||||
]
|
||||
|
||||
VALID_SIZES = [
|
||||
"1024x1024",
|
||||
"1536x1024", "1792x1024", "1344x768", "1248x832", "1184x864", "1152x896", "1536x672",
|
||||
"1024x1536", "1024x1792", "768x1344", "832x1248", "864x1184", "896x1152"
|
||||
]
|
||||
|
||||
RATIO_TO_SIZE = {
|
||||
"1:1": "1024x1024",
|
||||
"2:3": "832x1248",
|
||||
"3:2": "1248x832",
|
||||
"3:4": "1024x1536",
|
||||
"4:3": "1536x1024",
|
||||
"4:5": "864x1184",
|
||||
"5:4": "1184x864",
|
||||
"9:16": "1024x1792",
|
||||
"16:9": "1792x1024",
|
||||
"21:9": "1536x672"
|
||||
}
|
||||
|
||||
|
||||
class TextToImageGenerator:
|
||||
"""文生图生成器"""
|
||||
|
||||
def __init__(self, config: Optional[Dict[str, str]] = None):
|
||||
"""
|
||||
初始化生成器
|
||||
|
||||
Args:
|
||||
config: 配置字典,包含 api_key, base_url, model
|
||||
如果不传则从环境变量或配置文件读取
|
||||
"""
|
||||
if config is None:
|
||||
config = self._load_config()
|
||||
|
||||
self.api_key = config.get('api_key') or config.get('IMAGE_API_KEY')
|
||||
self.base_url = config.get('base_url') or config.get('IMAGE_API_BASE_URL')
|
||||
self.model = config.get('model') or config.get('IMAGE_MODEL') or 'lyra-flash-9'
|
||||
|
||||
if not self.api_key or not self.base_url:
|
||||
raise ValueError("缺少必要的 API 配置:api_key 和 base_url")
|
||||
|
||||
def _load_config(self) -> Dict[str, str]:
|
||||
"""从配置文件或环境变量加载配置"""
|
||||
config = {}
|
||||
|
||||
config_path = Path(__file__).parent.parent / 'config' / 'settings.json'
|
||||
if config_path.exists():
|
||||
with open(config_path, 'r', encoding='utf-8') as f:
|
||||
settings = json.load(f)
|
||||
api_config = settings.get('image_api', {})
|
||||
config['api_key'] = api_config.get('key')
|
||||
config['base_url'] = api_config.get('base_url')
|
||||
config['model'] = api_config.get('model')
|
||||
|
||||
config['api_key'] = os.getenv('IMAGE_API_KEY', config.get('api_key'))
|
||||
config['base_url'] = os.getenv('IMAGE_API_BASE_URL', config.get('base_url'))
|
||||
config['model'] = os.getenv('IMAGE_MODEL', config.get('model'))
|
||||
|
||||
return config
|
||||
|
||||
@staticmethod
|
||||
def image_to_base64(image_path: str, with_prefix: bool = True) -> str:
|
||||
"""将图片文件转换为 base64 编码"""
|
||||
path = Path(image_path)
|
||||
if not path.exists():
|
||||
raise FileNotFoundError(f"图片文件不存在: {image_path}")
|
||||
|
||||
suffix = path.suffix.lower()
|
||||
mime_types = {
|
||||
'.jpg': 'image/jpeg',
|
||||
'.jpeg': 'image/jpeg',
|
||||
'.png': 'image/png',
|
||||
'.gif': 'image/gif',
|
||||
'.webp': 'image/webp'
|
||||
}
|
||||
mime_type = mime_types.get(suffix, 'image/png')
|
||||
|
||||
with open(image_path, 'rb') as f:
|
||||
b64_str = base64.b64encode(f.read()).decode('utf-8')
|
||||
|
||||
if with_prefix:
|
||||
return f"data:{mime_type};base64,{b64_str}"
|
||||
return b64_str
|
||||
|
||||
def generate(
|
||||
self,
|
||||
prompt: str,
|
||||
size: Optional[str] = None,
|
||||
aspect_ratio: Optional[str] = None,
|
||||
image_size: Optional[str] = None,
|
||||
output_path: Optional[str] = None,
|
||||
response_format: str = "b64_json",
|
||||
ref_image: Optional[str] = None
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
生成图片
|
||||
|
||||
Args:
|
||||
prompt: 中文图像描述提示词
|
||||
size: 图片尺寸 (如 1792x1024),与 aspect_ratio 二选一
|
||||
aspect_ratio: 宽高比 (如 16:9, 3:4),推荐使用
|
||||
image_size: 分辨率 (1K/2K/4K),仅 gemini-3.0-pro-image-preview 支持
|
||||
output_path: 输出文件路径,如果提供则保存图片
|
||||
response_format: 响应格式,默认 b64_json
|
||||
ref_image: 参考图片路径,用于风格参考
|
||||
|
||||
Returns:
|
||||
包含生成结果的字典
|
||||
"""
|
||||
if ref_image:
|
||||
return self._generate_with_reference(
|
||||
prompt=prompt,
|
||||
ref_image=ref_image,
|
||||
aspect_ratio=aspect_ratio,
|
||||
size=size,
|
||||
output_path=output_path,
|
||||
response_format=response_format
|
||||
)
|
||||
|
||||
payload: Dict[str, Any] = {
|
||||
"model": self.model,
|
||||
"prompt": prompt,
|
||||
"response_format": response_format
|
||||
}
|
||||
|
||||
# 确定尺寸:优先用 aspect_ratio 映射,其次用 size
|
||||
if aspect_ratio:
|
||||
payload["size"] = RATIO_TO_SIZE.get(aspect_ratio, "1024x1024")
|
||||
elif size:
|
||||
payload["size"] = size
|
||||
else:
|
||||
payload["size"] = "1792x1024" # 默认 16:9
|
||||
|
||||
headers = {
|
||||
"Content-Type": "application/json",
|
||||
"Authorization": f"Bearer {self.api_key}"
|
||||
}
|
||||
|
||||
try:
|
||||
with httpx.Client(timeout=180.0) as client:
|
||||
response = client.post(
|
||||
f"{self.base_url}/images/generations",
|
||||
headers=headers,
|
||||
json=payload
|
||||
)
|
||||
response.raise_for_status()
|
||||
result = response.json()
|
||||
|
||||
if output_path and result.get("data"):
|
||||
b64_data = result["data"][0].get("b64_json")
|
||||
if b64_data:
|
||||
self._save_image(b64_data, output_path)
|
||||
result["saved_path"] = output_path
|
||||
|
||||
return {
|
||||
"success": True,
|
||||
"data": result,
|
||||
"saved_path": output_path if output_path else None
|
||||
}
|
||||
|
||||
except httpx.HTTPStatusError as e:
|
||||
return {
|
||||
"success": False,
|
||||
"error": f"HTTP 错误: {e.response.status_code}",
|
||||
"detail": str(e)
|
||||
}
|
||||
except Exception as e:
|
||||
return {
|
||||
"success": False,
|
||||
"error": "生成失败",
|
||||
"detail": str(e)
|
||||
}
|
||||
|
||||
def _generate_with_reference(
|
||||
self,
|
||||
prompt: str,
|
||||
ref_image: str,
|
||||
aspect_ratio: Optional[str] = None,
|
||||
size: Optional[str] = None,
|
||||
output_path: Optional[str] = None,
|
||||
response_format: str = "b64_json"
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
参考图片风格生成新图
|
||||
|
||||
Args:
|
||||
prompt: 新图内容描述
|
||||
ref_image: 参考图片路径
|
||||
aspect_ratio: 宽高比
|
||||
size: 尺寸
|
||||
output_path: 输出路径
|
||||
response_format: 响应格式
|
||||
"""
|
||||
image_b64 = self.image_to_base64(ref_image)
|
||||
|
||||
enhanced_prompt = f"参考这张图片的背景风格、配色方案和视觉设计,保持完全一致的风格,生成新内容:{prompt}"
|
||||
|
||||
# 确定尺寸:优先用 aspect_ratio 映射,其次用 size
|
||||
if size is None:
|
||||
size = RATIO_TO_SIZE.get(aspect_ratio, "1024x1792") if aspect_ratio else "1024x1792"
|
||||
|
||||
payload = {
|
||||
"model": self.model,
|
||||
"prompt": enhanced_prompt,
|
||||
"image": image_b64,
|
||||
"size": size,
|
||||
"response_format": response_format
|
||||
}
|
||||
|
||||
headers = {
|
||||
"Content-Type": "application/json",
|
||||
"Authorization": f"Bearer {self.api_key}"
|
||||
}
|
||||
|
||||
try:
|
||||
with httpx.Client(timeout=180.0) as client:
|
||||
response = client.post(
|
||||
f"{self.base_url}/images/edits",
|
||||
headers=headers,
|
||||
json=payload
|
||||
)
|
||||
response.raise_for_status()
|
||||
result = response.json()
|
||||
|
||||
if output_path and result.get("data"):
|
||||
b64_data = result["data"][0].get("b64_json")
|
||||
if b64_data:
|
||||
self._save_image(b64_data, output_path)
|
||||
result["saved_path"] = output_path
|
||||
|
||||
return {
|
||||
"success": True,
|
||||
"data": result,
|
||||
"saved_path": output_path if output_path else None
|
||||
}
|
||||
|
||||
except httpx.HTTPStatusError as e:
|
||||
return {
|
||||
"success": False,
|
||||
"error": f"HTTP 错误: {e.response.status_code}",
|
||||
"detail": str(e)
|
||||
}
|
||||
except Exception as e:
|
||||
return {
|
||||
"success": False,
|
||||
"error": "生成失败",
|
||||
"detail": str(e)
|
||||
}
|
||||
|
||||
def _save_image(self, b64_data: str, output_path: str) -> None:
|
||||
"""保存 base64 图片到文件"""
|
||||
image_data = base64.b64decode(b64_data)
|
||||
Path(output_path).parent.mkdir(parents=True, exist_ok=True)
|
||||
with open(output_path, 'wb') as f:
|
||||
f.write(image_data)
|
||||
|
||||
|
||||
def main():
|
||||
"""命令行入口"""
|
||||
import argparse
|
||||
import time
|
||||
|
||||
parser = argparse.ArgumentParser(
|
||||
description='文生图工具',
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog=f'''
|
||||
尺寸参数说明:
|
||||
-r/--ratio 推荐使用,支持: {", ".join(VALID_ASPECT_RATIOS)}
|
||||
-s/--size 传统尺寸,支持: {", ".join(VALID_SIZES[:4])}...
|
||||
--resolution 分辨率(1K/2K/4K),仅 gemini-3.0-pro-image-preview 支持
|
||||
--ref 参考图片路径,后续图片将参考首图风格生成
|
||||
|
||||
示例:
|
||||
python text_to_image.py "描述" -r 3:4 # 竖版 3:4
|
||||
python text_to_image.py "描述" -r 9:16 -o out.png # 竖屏 9:16
|
||||
python text_to_image.py "描述" -s 1024x1792 # 传统尺寸
|
||||
|
||||
# 长图场景:首图定调,后续参考首图风格
|
||||
python text_to_image.py "首屏内容" -r 3:4 -o 01.png
|
||||
python text_to_image.py "第二屏内容" -r 3:4 --ref 01.png -o 02.png
|
||||
'''
|
||||
)
|
||||
parser.add_argument('prompt', help='中文图像描述提示词')
|
||||
parser.add_argument('-o', '--output', help='输出文件路径(默认保存到当前目录)')
|
||||
parser.add_argument('-r', '--ratio', help=f'宽高比,推荐使用。可选: {", ".join(VALID_ASPECT_RATIOS)}')
|
||||
parser.add_argument('-s', '--size', help='图片尺寸 (如 1792x1024)')
|
||||
parser.add_argument('--resolution', help='分辨率 (1K/2K/4K),仅部分模型支持')
|
||||
parser.add_argument('--ref', help='参考图片路径,用于风格参考(长图场景)')
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if args.ratio and args.ratio not in VALID_ASPECT_RATIOS:
|
||||
print(f"错误: 不支持的宽高比 '{args.ratio}'")
|
||||
print(f"支持的宽高比: {', '.join(VALID_ASPECT_RATIOS)}")
|
||||
return
|
||||
|
||||
if args.size and args.size not in VALID_SIZES:
|
||||
print(f"警告: 尺寸 '{args.size}' 可能不被支持")
|
||||
print(f"推荐使用 -r/--ratio 参数指定宽高比")
|
||||
|
||||
if args.ref and not os.path.exists(args.ref):
|
||||
print(f"错误: 参考图片不存在: {args.ref}")
|
||||
return
|
||||
|
||||
output_path = args.output
|
||||
if not output_path:
|
||||
timestamp = time.strftime("%Y%m%d_%H%M%S")
|
||||
output_path = f"generated_{timestamp}.png"
|
||||
|
||||
generator = TextToImageGenerator()
|
||||
result = generator.generate(
|
||||
prompt=args.prompt,
|
||||
size=args.size,
|
||||
aspect_ratio=args.ratio,
|
||||
image_size=args.resolution,
|
||||
output_path=output_path,
|
||||
ref_image=args.ref
|
||||
)
|
||||
|
||||
if result["success"]:
|
||||
print(f"生成成功!")
|
||||
if result.get("saved_path"):
|
||||
print(f"图片已保存到: {result['saved_path']}")
|
||||
else:
|
||||
print(f"生成失败: {result['error']}")
|
||||
print(f"详情: {result.get('detail', 'N/A')}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
Reference in New Issue
Block a user