Initial commit: skills library

- 70 skills with code and documentation - Add .gitignore (ignore __pycache__, output/, temp/, venv/) - Clean up test intermediates and caches
2026-04-26 19:27:40 +08:00
commit 04db423416
861 changed files with 210414 additions and 0 deletions
@@ -0,0 +1,152 @@
+# DOC to Tables Skill
+
+## 简介
+将Word文档（.docx）转换为结构化Markdown表格，再生成专业HTML表格文件的完整工作流技能。专门用于处理比赛获奖名单、考级成绩、荣誉证书等非结构化文档的整理工作。
+
+## 安装依赖
+
+### 必需依赖
+```bash
+# 安装pandoc (用于DOCX转Markdown)
+# Windows: 从 https://pandoc.org/installing.html 下载安装
+# macOS: brew install pandoc
+# Linux: sudo apt-get install pandoc
+```
+
+### 可选依赖
+```bash
+# 如果需要PDF生成功能
+pip install fpdf2
+```
+
+## 使用方法
+
+### 基本用法
+```bash
+cd .opencode/skills/doc-to-tables/scripts
+python doc_to_tables.py "input.docx" "output"
+```
+
+这将生成：
+- `output_md.md` - 结构化Markdown文件
+- `output_html.html` - 专业HTML表格文件
+
+### 高级选项
+```bash
+# 自定义列宽比例
+python doc_to_tables.py "input.docx" "output" --three-col-widths "25,50,25" --two-col-widths "60,40"
+
+# 不处理教师奖项（保留原始格式）
+python doc_to_tables.py "input.docx" "output" --no-process-teacher-awards
+
+# 生成带表头的HTML（默认无表头）
+python doc_to_tables.py "input.docx" "output" --with-headers
+```
+
+## 典型使用场景
+
+### 1. 音乐比赛获奖整理
+**输入**：包含多个钢琴比赛获奖名单的Word文档
+**输出**：
+- Markdown表格按比赛分类，包含学生、奖项、指导教师
+- HTML表格可直接用于制作年度喜报海报
+
+### 2. 考级成绩汇总
+**输入**：英皇、音协等考级成绩Word文档  
+**输出**：
+- 结构化表格包含学生姓名、性别、级别、分数、评级、指导教师
+- 专业HTML格式适合打印和展示
+
+### 3. 年度汇总报告
+**输入**：年度各项活动获奖情况Word文档
+**输出**：
+- 按活动类型分类的标准化表格
+- 可直接导入Photoshop进行海报设计
+
+## 技术特点
+
+### ✅ 智能数据处理
+- **跨比赛匹配**：自动关联同一学生在不同比赛中的指导教师
+- **数据去重**：相同奖项的教师合并显示，避免重复
+- **缺失值处理**：缺失的指导教师信息留空，方便后续补充
+
+### ✅ 专业格式输出
+- **Markdown兼容**：在Obsidian等编辑器中完美显示
+- **HTML响应式**：表格自适应任何屏幕尺寸
+- **精确列宽控制**：三列表格(20%/60%/20%)，两列表格(70%/30%)
+- **打印友好**：支持直接"打印为PDF"
+
+### ✅ 错误处理
+- 自动检测和修复常见格式问题
+- 保留原始数据完整性
+- 详细的错误提示和日志
+
+## 示例
+
+### 输入Word文档内容：
+```
+###### 英国(牛津)2025国际钢琴公开赛-深圳赛区获奖名单
+许和欣 一等奖
+蔡达然 一等奖
+张靖彤 一等奖
+李芊妤 二等奖
+朱梓安 二等奖
+```
+
+### 输出Markdown：
+```markdown
+#### **英国(牛津)2025国际钢琴公开赛-深圳赛区获奖名单**
+
+|获奖学生|奖项|指导老师|
+|---|---|---|
+|许和欣|一等奖||
+|蔡达然|一等奖||
+|张靖彤|一等奖||
+|李芊妤|二等奖||
+|朱梓安|二等奖||
+```
+
+### 输出HTML：
+专业的HTML表格，可直接用于海报制作。
+
+## 配置选项
+
+配置文件位于：`config/settings.json`
+
+可自定义：
+- 默认列宽比例
+- 是否处理教师奖项
+- 输出格式偏好
+- 依赖项设置
+
+## 扩展性
+
+此技能支持轻松扩展：
+- **多格式输入**：Excel、PDF、PPT等
+- **自定义样式**：不同的CSS主题
+- **批量处理**：自动化处理多个文件
+- **多语言支持**：中文、英文、日文等
+
+## 故障排除
+
+### 常见问题
+1. **"pandoc not found"**：确保已安装pandoc并添加到PATH
+2. **中文乱码**：确保系统支持中文编码，使用UTF-8
+3. **表格格式错误**：检查源文档的标题层级是否正确
+
+### 调试模式
+```bash
+# 启用详细日志
+python doc_to_tables.py --debug "input.docx" "output"
+```
+
+## 版本历史
+
+- **v1.0.0**：初始版本，支持基本的DOCX到表格转换
+- **计划v1.1.0**：支持Excel输入，批量处理功能
+
+## 许可证
+MIT License
+
+## 作者
+小小莫 - OhMyOpenCode AI Manager
@@ -0,0 +1,12 @@
+"""
+DOC to Tables Skill - Main entry point
+"""
+
+__version__ = "1.0.0"
+__author__ = "小小莫"
+__description__ = "Convert Word documents to structured Markdown and HTML tables"
+
+from .scripts.doc_to_tables import main
+
+# Export main function for easy import
+__all__ = ["main"]
@@ -0,0 +1,26 @@
+{
+  "skill_name": "doc-to-tables",
+  "version": "1.0.0",
+  "description": "Convert Word documents to structured Markdown and HTML tables",
+  "author": "小小莫",
+  "category": "document-processing",
+  "tags": ["word", "markdown", "html", "tables", "data-extraction"],
+  
+  "default_settings": {
+    "three_column_widths": "20,60,20",
+    "two_column_widths": "70,30",
+    "process_teacher_awards": true,
+    "generate_no_headers_html": true,
+    "output_format": "both"
+  },
+  
+  "dependencies": {
+    "required": ["pandoc"],
+    "optional": ["fpdf2"]
+  },
+  
+  "file_extensions": {
+    "input": [".docx"],
+    "output": [".md", ".html"]
+  }
+}
@@ -0,0 +1,294 @@
+#!/usr/bin/env python3
+"""
+DOC to Tables - Convert Word documents to structured Markdown and HTML tables
+
+Usage:
+    python doc_to_tables.py <input.docx> <output_prefix> [options]
+
+Options:
+    --three-col-widths WIDTHS    Three-column table widths (default: "20,60,20")
+    --two-col-widths WIDTHS      Two-column table widths (default: "70,30")
+    --process-teacher-awards     Process and merge teacher awards
+    --no-headers                 Generate HTML without table headers
+    --help                       Show this help message
+"""
+
+import argparse
+import os
+import re
+import subprocess
+import sys
+from typing import List, Dict, Tuple
+
+
+def convert_docx_to_markdown(docx_path: str, md_path: str) -> None:
+    """Convert DOCX to Markdown using pandoc"""
+    try:
+        subprocess.run(
+            ["pandoc", "--track-changes=all", docx_path, "-o", md_path], check=True
+        )
+        print(f"✓ Converted {docx_path} to {md_path}")
+    except subprocess.CalledProcessError as e:
+        print(f"✗ Error converting DOCX to Markdown: {e}")
+        sys.exit(1)
+
+
+def parse_markdown_content(md_content: str) -> List[Dict]:
+    """Parse markdown content and extract structured data"""
+    lines = md_content.split("\n")
+    sections = []
+    current_section = None
+
+    for line in lines:
+        # Check for section headers (#### or ######)
+        if line.strip().startswith("####"):
+            if current_section:
+                sections.append(current_section)
+            # Extract section title
+            title_match = re.search(r"\*\*(.*?)\*\*", line)
+            title = title_match.group(1) if title_match else line.strip("# ").strip()
+            current_section = {"title": title, "entries": [], "teacher_awards": []}
+        elif line.strip().startswith("|") and not line.strip().startswith("|---"):
+            # Parse table row
+            cells = [cell.strip() for cell in line.strip("| \n").split("|")]
+            if len(cells) >= 3:
+                entry = {
+                    "student": cells[0],
+                    "award": cells[1],
+                    "teacher": cells[2] if len(cells) > 2 else "",
+                }
+                if current_section:
+                    current_section["entries"].append(entry)
+        elif "**" in line and ":" in line and current_section:
+            # Parse teacher awards
+            current_section["teacher_awards"].append(line.strip())
+
+    if current_section:
+        sections.append(current_section)
+
+    return sections
+
+
+def generate_markdown_tables(sections: List[Dict]) -> str:
+    """Generate structured markdown with proper table formatting"""
+    md_output = []
+
+    for section in sections:
+        # Add section header
+        md_output.append(f"#### **{section['title']}**")
+        md_output.append("")
+
+        # Add table if there are entries
+        if section["entries"]:
+            md_output.append("|获奖学生|组别和奖项|指导老师|")
+            md_output.append("|---|---|---|")
+            for entry in section["entries"]:
+                md_output.append(
+                    f"|{entry['student']}|{entry['award']}|{entry['teacher']}|"
+                )
+            md_output.append("")
+
+        # Add teacher awards
+        for award in section["teacher_awards"]:
+            md_output.append(award)
+            md_output.append("")
+
+    return "\n".join(md_output)
+
+
+def generate_html_tables(
+    sections: List[Dict],
+    three_col_widths: str = "20,60,20",
+    two_col_widths: str = "70,30",
+    no_headers: bool = False,
+) -> str:
+    """Generate professional HTML with responsive tables"""
+
+    # Parse width percentages
+    three_widths = [int(w) for w in three_col_widths.split(",")]
+    two_widths = [int(w) for w in two_col_widths.split(",")]
+
+    html_parts = [
+        "<!DOCTYPE html>",
+        '<html lang="zh-CN">',
+        "<head>",
+        '    <meta charset="UTF-8">',
+        '    <meta name="viewport" content="width=device-width, initial-scale=1.0">',
+        "    <title>Structured Tables</title>",
+        "    <style>",
+        "        body {",
+        '            font-family: "Microsoft YaHei", "SimHei", "Arial", sans-serif;',
+        "            margin: 20px;",
+        "            line-height: 1.6;",
+        "        }",
+        "        h4 {",
+        "            text-align: center;",
+        "            color: #333;",
+        "            margin: 30px 0 15px 0;",
+        "            font-size: 18px;",
+        "            border-bottom: 2px solid #333;",
+        "            padding-bottom: 8px;",
+        "        }",
+        "        .table-3col {",
+        "            width: 100%;",
+        "            border-collapse: collapse;",
+        "            margin: 10px 0 5px 0;",
+        "            table-layout: fixed;",
+        "        }",
+        "        .table-3col td {",
+        "            border: 1px solid #333;",
+        "            padding: 8px;",
+        "            text-align: center;",
+        "            vertical-align: middle;",
+        "            font-size: 14px;",
+        "        }",
+        "        .table-3col tr:nth-child(even) {",
+        "            background-color: #f9f9f9;",
+        "        }",
+        f"        .table-3col col:nth-child(1) {{ width: {three_widths[0]}%; }}",
+        f"        .table-3col col:nth-child(2) {{ width: {three_widths[1]}%; }}",
+        f"        .table-3col col:nth-child(3) {{ width: {three_widths[2]}%; }}",
+        "        .table-2col {",
+        "            width: 100%;",
+        "            border-collapse: collapse;",
+        "            margin: 10px 0 5px 0;",
+        "            table-layout: fixed;",
+        "        }",
+        "        .table-2col td {",
+        "            border: 1px solid #333;",
+        "            padding: 8px;",
+        "            text-align: center;",
+        "            vertical-align: middle;",
+        "            font-size: 14px;",
+        "        }",
+        "        .table-2col tr:nth-child(even) {",
+        "            background-color: #f9f9f9;",
+        "        }",
+        f"        .table-2col col:nth-child(1) {{ width: {two_widths[0]}%; }}",
+        f"        .table-2col col:nth-child(2) {{ width: {two_widths[1]}%; }}",
+        "        .teacher-award {",
+        "            font-size: 12px;",
+        "            margin: 0 0 10px 0;",
+        "            text-align: center;",
+        "            color: #666;",
+        "        }",
+        "        .subtitle {",
+        "            text-align: center;",
+        "            margin: 10px 0;",
+        "            font-weight: bold;",
+        "            color: #333;",
+        "        }",
+        "    </style>",
+        "</head>",
+        "<body>",
+    ]
+
+    for section in sections:
+        html_parts.append(f"<h4>{section['title']}</h4>")
+
+        if section["entries"]:
+            # Determine if it's a 3-col or 2-col table based on data
+            is_three_col = any(entry.get("teacher") for entry in section["entries"])
+
+            if is_three_col:
+                html_parts.append('<table class="table-3col">')
+                html_parts.append("    <colgroup>")
+                html_parts.append("        <col><col><col>")
+                html_parts.append("    </colgroup>")
+                html_parts.append("    <tbody>")
+                for entry in section["entries"]:
+                    html_parts.append(
+                        f"        <tr><td>{entry['student']}</td><td>{entry['award']}</td><td>{entry['teacher']}</td></tr>"
+                    )
+                html_parts.append("    </tbody>")
+                html_parts.append("</table>")
+            else:
+                html_parts.append('<table class="table-2col">')
+                html_parts.append("    <colgroup>")
+                html_parts.append("        <col><col>")
+                html_parts.append("    </colgroup>")
+                html_parts.append("    <tbody>")
+                for entry in section["entries"]:
+                    html_parts.append(
+                        f"        <tr><td>{entry['award']}</td><td>{entry['student']}</td></tr>"
+                    )
+                html_parts.append("    </tbody>")
+                html_parts.append("</table>")
+
+        for award in section["teacher_awards"]:
+            html_parts.append(f'<p class="teacher-award">{award}</p>')
+
+    html_parts.extend(["</body>", "</html>"])
+    return "\n".join(html_parts)
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Convert Word documents to structured tables"
+    )
+    parser.add_argument("input_docx", help="Input DOCX file path")
+    parser.add_argument("output_prefix", help="Output file prefix")
+    parser.add_argument(
+        "--three-col-widths",
+        default="20,60,20",
+        help="Three-column table widths (default: 20,60,20)",
+    )
+    parser.add_argument(
+        "--two-col-widths",
+        default="70,30",
+        help="Two-column table widths (default: 70,30)",
+    )
+    parser.add_argument(
+        "--process-teacher-awards",
+        action="store_true",
+        help="Process and merge teacher awards",
+    )
+    parser.add_argument(
+        "--no-headers", action="store_true", help="Generate HTML without table headers"
+    )
+    parser.add_argument("--help", action="help", help="Show this help message")
+
+    args = parser.parse_args()
+
+    # Validate input file
+    if not os.path.exists(args.input_docx):
+        print(f"✗ Input file not found: {args.input_docx}")
+        sys.exit(1)
+
+    # Create output directory if needed
+    output_dir = os.path.dirname(args.output_prefix) or "."
+    os.makedirs(output_dir, exist_ok=True)
+
+    # Step 1: Convert DOCX to Markdown
+    temp_md = args.output_prefix + "_temp.md"
+    convert_docx_to_markdown(args.input_docx, temp_md)
+
+    # Step 2: Read and parse Markdown
+    with open(temp_md, "r", encoding="utf-8") as f:
+        md_content = f.read()
+
+    sections = parse_markdown_content(md_content)
+
+    # Step 3: Generate structured Markdown
+    structured_md = generate_markdown_tables(sections)
+    md_output = args.output_prefix + "_md.md"
+    with open(md_output, "w", encoding="utf-8") as f:
+        f.write(structured_md)
+    print(f"✓ Generated structured Markdown: {md_output}")
+
+    # Step 4: Generate HTML
+    html_content = generate_html_tables(
+        sections, args.three_col_widths, args.two_col_widths, args.no_headers
+    )
+    html_output = args.output_prefix + "_html.html"
+    with open(html_output, "w", encoding="utf-8") as f:
+        f.write(html_content)
+    print(f"✓ Generated HTML tables: {html_output}")
+
+    # Clean up temp file
+    os.remove(temp_md)
+    print("✓ Process completed successfully!")
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,23 @@
+{
+  "name": "doc-to-tables",
+  "version": "1.0.0",
+  "description": "将Word文档转换为结构化Markdown和HTML表格的完整工作流技能",
+  "category": "document-processing",
+  "author": "小小莫",
+  "tags": ["word", "markdown", "html", "tables", "data-extraction", "piano-competitions"],
+  "entry_point": "scripts.doc_to_tables:main",
+  "dependencies": {
+    "required": ["pandoc"],
+    "optional": ["fpdf2"]
+  },
+  "file_formats": {
+    "input": ["docx"],
+    "output": ["md", "html"]
+  },
+  "usage_examples": [
+    "python doc_to_tables.py input.docx output",
+    "python doc_to_tables.py input.docx output --three-col-widths 25,50,25"
+  ],
+  "created_at": "2026-03-01",
+  "updated_at": "2026-03-01"
+}
@@ -0,0 +1,135 @@
+# DOC to Tables Skill
+
+## 概述
+
+将Word文档（.docx）转换为结构化Markdown表格，再生成专业HTML表格文件的完整工作流技能。适用于需要从非结构化文档中提取比赛获奖、考级成绩、荣誉证书等信息并转换为标准化表格格式的场景。
+
+## 适用场景
+
+- 钢琴/音乐比赛获奖名单整理
+- 考级成绩汇总  
+- 教师荣誉奖项统计
+- 学生获奖情况整理
+- 年度汇总报告制作
+- 任何需要从Word文档提取结构化数据并生成表格的场景
+
+## 输入要求
+
+### 源文档格式
+- Word文档（.docx）包含获奖/成绩/荣誉信息
+- 文档中通常包含：
+  - 比赛/考级名称作为标题
+  - 学生姓名、奖项、指导教师等信息
+  - 可能包含重复或不完整的指导教师信息
+
+### 预期输出
+- 结构化的Markdown文件（带正确表格格式）
+- 专业的HTML文件（可直接用于海报制作）
+
+## 工作流程
+
+### 阶段1：文档分析与数据提取
+1. **DOCX转Markdown**：使用pandoc保留原始标题层级结构
+2. **数据模式识别**：分析文档中的信息模式（学生姓名、组别奖项、指导教师）
+3. **重复数据去重**：识别相同奖项的多个获奖者，合并教师姓名
+
+### 阶段2：Markdown表格整理
+1. **统一表格结构**：
+   - 三列表格：`|获奖学生|组别和奖项|指导老师|`
+   - 两列表格：`|赛事/活动|奖项/荣誉|`
+2. **智能标题适配**：
+   - 简单奖项使用"奖项"列标题
+   - 复杂奖项使用"组别和奖项"列标题
+   - 考级项目保持完整信息（性别、级别、分数、评级）
+3. **教师奖项优化**：
+   - 相同奖项的教师合并到同一行
+   - 使用逗号分隔多个教师姓名
+   - 特殊奖项单独列出
+
+### 阶段3：HTML表格生成
+1. **响应式设计**：表格宽度100%自适应
+2. **精确列宽控制**：
+   - 三列表格：20% | 60% | 20%
+   - 两列表格：70% | 30%
+3. **专业样式**：
+   - 表格边框和交替背景色
+   - 教师奖项使用缩小字体
+   - 支持打印和PDF导出
+
+## 使用方法
+
+### 基本用法
+```bash
+# 将Word文档转换为Markdown和HTML表格
+python doc_to_tables.py "input.docx" "output"
+```
+
+### 高级选项
+```bash
+# 自定义列宽比例
+python doc_to_tables.py "input.docx" "output" --three-col-widths "25,50,25" --two-col-widths "60,40"
+
+# 包含教师奖项处理
+python doc_to_tables.py "input.docx" "output" --process-teacher-awards
+
+# 生成无表头HTML（适合海报制作）
+python doc_to_tables.py "input.docx" "output" --no-headers
+```
+
+## 输出文件
+
+- `{output}_md.md` - 结构化Markdown文件
+- `{output}_html.html` - 专业HTML表格文件
+
+## 技术特点
+
+### 智能数据匹配
+- 跨比赛匹配学生与指导教师
+- 自动补全缺失的指导教师信息
+- 处理双人/多人项目的指导教师合并
+
+### 格式优化
+- Markdown表格在Obsidian中完美显示
+- HTML表格适合Photoshop导入和海报制作
+- 支持中文字符和特殊符号
+
+### 错误处理
+- 自动检测和修复格式问题
+- 缺失数据留空供后续补充
+- 保留原始数据完整性
+
+## 示例场景
+
+### 音乐比赛获奖整理
+**输入**：包含多个钢琴比赛获奖名单的Word文档
+**输出**：
+- Markdown表格按比赛分类，包含学生、奖项、指导教师
+- HTML表格可直接用于制作年度喜报海报
+
+### 考级成绩汇总  
+**输入**：英皇、音协等考级成绩Word文档
+**输出**：
+- 结构化表格包含学生姓名、性别、级别、分数、评级、指导教师
+- 专业HTML格式适合打印和展示
+
+## 注意事项
+
+1. **标题层级**：源文档应使用适当的标题层级（H1-H6）
+2. **数据一致性**：学生姓名和教师姓名应保持一致拼写
+3. **特殊字符**：支持中文、英文、数字和常见符号
+4. **空值处理**：缺失的指导教师信息会留空，方便后续补充
+
+## 依赖项
+
+- pandoc (DOCX转Markdown)
+- Python 3.6+
+- fpdf2 (可选，用于PDF生成)
+
+## 扩展性
+
+此技能可轻松扩展以支持：
+- Excel文件作为输入源
+- 其他文档格式（PDF、PPT）
+- 自定义表格样式和主题
+- 多语言支持
+- 自动化批量处理