Initial commit: skills library
- 70 skills with code and documentation - Add .gitignore (ignore __pycache__, output/, temp/, venv/) - Clean up test intermediates and caches
This commit is contained in:
@@ -0,0 +1,152 @@
|
||||
# DOC to Tables Skill
|
||||
|
||||
## 简介
|
||||
将Word文档(.docx)转换为结构化Markdown表格,再生成专业HTML表格文件的完整工作流技能。专门用于处理比赛获奖名单、考级成绩、荣誉证书等非结构化文档的整理工作。
|
||||
|
||||
## 安装依赖
|
||||
|
||||
### 必需依赖
|
||||
```bash
|
||||
# 安装pandoc (用于DOCX转Markdown)
|
||||
# Windows: 从 https://pandoc.org/installing.html 下载安装
|
||||
# macOS: brew install pandoc
|
||||
# Linux: sudo apt-get install pandoc
|
||||
```
|
||||
|
||||
### 可选依赖
|
||||
```bash
|
||||
# 如果需要PDF生成功能
|
||||
pip install fpdf2
|
||||
```
|
||||
|
||||
## 使用方法
|
||||
|
||||
### 基本用法
|
||||
```bash
|
||||
cd .opencode/skills/doc-to-tables/scripts
|
||||
python doc_to_tables.py "input.docx" "output"
|
||||
```
|
||||
|
||||
这将生成:
|
||||
- `output_md.md` - 结构化Markdown文件
|
||||
- `output_html.html` - 专业HTML表格文件
|
||||
|
||||
### 高级选项
|
||||
```bash
|
||||
# 自定义列宽比例
|
||||
python doc_to_tables.py "input.docx" "output" --three-col-widths "25,50,25" --two-col-widths "60,40"
|
||||
|
||||
# 不处理教师奖项(保留原始格式)
|
||||
python doc_to_tables.py "input.docx" "output" --no-process-teacher-awards
|
||||
|
||||
# 生成带表头的HTML(默认无表头)
|
||||
python doc_to_tables.py "input.docx" "output" --with-headers
|
||||
```
|
||||
|
||||
## 典型使用场景
|
||||
|
||||
### 1. 音乐比赛获奖整理
|
||||
**输入**:包含多个钢琴比赛获奖名单的Word文档
|
||||
**输出**:
|
||||
- Markdown表格按比赛分类,包含学生、奖项、指导教师
|
||||
- HTML表格可直接用于制作年度喜报海报
|
||||
|
||||
### 2. 考级成绩汇总
|
||||
**输入**:英皇、音协等考级成绩Word文档
|
||||
**输出**:
|
||||
- 结构化表格包含学生姓名、性别、级别、分数、评级、指导教师
|
||||
- 专业HTML格式适合打印和展示
|
||||
|
||||
### 3. 年度汇总报告
|
||||
**输入**:年度各项活动获奖情况Word文档
|
||||
**输出**:
|
||||
- 按活动类型分类的标准化表格
|
||||
- 可直接导入Photoshop进行海报设计
|
||||
|
||||
## 技术特点
|
||||
|
||||
### ✅ 智能数据处理
|
||||
- **跨比赛匹配**:自动关联同一学生在不同比赛中的指导教师
|
||||
- **数据去重**:相同奖项的教师合并显示,避免重复
|
||||
- **缺失值处理**:缺失的指导教师信息留空,方便后续补充
|
||||
|
||||
### ✅ 专业格式输出
|
||||
- **Markdown兼容**:在Obsidian等编辑器中完美显示
|
||||
- **HTML响应式**:表格自适应任何屏幕尺寸
|
||||
- **精确列宽控制**:三列表格(20%/60%/20%),两列表格(70%/30%)
|
||||
- **打印友好**:支持直接"打印为PDF"
|
||||
|
||||
### ✅ 错误处理
|
||||
- 自动检测和修复常见格式问题
|
||||
- 保留原始数据完整性
|
||||
- 详细的错误提示和日志
|
||||
|
||||
## 示例
|
||||
|
||||
### 输入Word文档内容:
|
||||
```
|
||||
###### 英国(牛津)2025国际钢琴公开赛-深圳赛区获奖名单
|
||||
许和欣 一等奖
|
||||
蔡达然 一等奖
|
||||
张靖彤 一等奖
|
||||
李芊妤 二等奖
|
||||
朱梓安 二等奖
|
||||
```
|
||||
|
||||
### 输出Markdown:
|
||||
```markdown
|
||||
#### **英国(牛津)2025国际钢琴公开赛-深圳赛区获奖名单**
|
||||
|
||||
|获奖学生|奖项|指导老师|
|
||||
|---|---|---|
|
||||
|许和欣|一等奖||
|
||||
|蔡达然|一等奖||
|
||||
|张靖彤|一等奖||
|
||||
|李芊妤|二等奖||
|
||||
|朱梓安|二等奖||
|
||||
```
|
||||
|
||||
### 输出HTML:
|
||||
专业的HTML表格,可直接用于海报制作。
|
||||
|
||||
## 配置选项
|
||||
|
||||
配置文件位于:`config/settings.json`
|
||||
|
||||
可自定义:
|
||||
- 默认列宽比例
|
||||
- 是否处理教师奖项
|
||||
- 输出格式偏好
|
||||
- 依赖项设置
|
||||
|
||||
## 扩展性
|
||||
|
||||
此技能支持轻松扩展:
|
||||
- **多格式输入**:Excel、PDF、PPT等
|
||||
- **自定义样式**:不同的CSS主题
|
||||
- **批量处理**:自动化处理多个文件
|
||||
- **多语言支持**:中文、英文、日文等
|
||||
|
||||
## 故障排除
|
||||
|
||||
### 常见问题
|
||||
1. **"pandoc not found"**:确保已安装pandoc并添加到PATH
|
||||
2. **中文乱码**:确保系统支持中文编码,使用UTF-8
|
||||
3. **表格格式错误**:检查源文档的标题层级是否正确
|
||||
|
||||
### 调试模式
|
||||
```bash
|
||||
# 启用详细日志
|
||||
python doc_to_tables.py --debug "input.docx" "output"
|
||||
```
|
||||
|
||||
## 版本历史
|
||||
|
||||
- **v1.0.0**:初始版本,支持基本的DOCX到表格转换
|
||||
- **计划v1.1.0**:支持Excel输入,批量处理功能
|
||||
|
||||
## 许可证
|
||||
MIT License
|
||||
|
||||
## 作者
|
||||
小小莫 - OhMyOpenCode AI Manager
|
||||
@@ -0,0 +1,12 @@
|
||||
"""
|
||||
DOC to Tables Skill - Main entry point
|
||||
"""
|
||||
|
||||
__version__ = "1.0.0"
|
||||
__author__ = "小小莫"
|
||||
__description__ = "Convert Word documents to structured Markdown and HTML tables"
|
||||
|
||||
from .scripts.doc_to_tables import main
|
||||
|
||||
# Export main function for easy import
|
||||
__all__ = ["main"]
|
||||
@@ -0,0 +1,26 @@
|
||||
{
|
||||
"skill_name": "doc-to-tables",
|
||||
"version": "1.0.0",
|
||||
"description": "Convert Word documents to structured Markdown and HTML tables",
|
||||
"author": "小小莫",
|
||||
"category": "document-processing",
|
||||
"tags": ["word", "markdown", "html", "tables", "data-extraction"],
|
||||
|
||||
"default_settings": {
|
||||
"three_column_widths": "20,60,20",
|
||||
"two_column_widths": "70,30",
|
||||
"process_teacher_awards": true,
|
||||
"generate_no_headers_html": true,
|
||||
"output_format": "both"
|
||||
},
|
||||
|
||||
"dependencies": {
|
||||
"required": ["pandoc"],
|
||||
"optional": ["fpdf2"]
|
||||
},
|
||||
|
||||
"file_extensions": {
|
||||
"input": [".docx"],
|
||||
"output": [".md", ".html"]
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,294 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
DOC to Tables - Convert Word documents to structured Markdown and HTML tables
|
||||
|
||||
Usage:
|
||||
python doc_to_tables.py <input.docx> <output_prefix> [options]
|
||||
|
||||
Options:
|
||||
--three-col-widths WIDTHS Three-column table widths (default: "20,60,20")
|
||||
--two-col-widths WIDTHS Two-column table widths (default: "70,30")
|
||||
--process-teacher-awards Process and merge teacher awards
|
||||
--no-headers Generate HTML without table headers
|
||||
--help Show this help message
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import os
|
||||
import re
|
||||
import subprocess
|
||||
import sys
|
||||
from typing import List, Dict, Tuple
|
||||
|
||||
|
||||
def convert_docx_to_markdown(docx_path: str, md_path: str) -> None:
|
||||
"""Convert DOCX to Markdown using pandoc"""
|
||||
try:
|
||||
subprocess.run(
|
||||
["pandoc", "--track-changes=all", docx_path, "-o", md_path], check=True
|
||||
)
|
||||
print(f"✓ Converted {docx_path} to {md_path}")
|
||||
except subprocess.CalledProcessError as e:
|
||||
print(f"✗ Error converting DOCX to Markdown: {e}")
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
def parse_markdown_content(md_content: str) -> List[Dict]:
|
||||
"""Parse markdown content and extract structured data"""
|
||||
lines = md_content.split("\n")
|
||||
sections = []
|
||||
current_section = None
|
||||
|
||||
for line in lines:
|
||||
# Check for section headers (#### or ######)
|
||||
if line.strip().startswith("####"):
|
||||
if current_section:
|
||||
sections.append(current_section)
|
||||
# Extract section title
|
||||
title_match = re.search(r"\*\*(.*?)\*\*", line)
|
||||
title = title_match.group(1) if title_match else line.strip("# ").strip()
|
||||
current_section = {"title": title, "entries": [], "teacher_awards": []}
|
||||
elif line.strip().startswith("|") and not line.strip().startswith("|---"):
|
||||
# Parse table row
|
||||
cells = [cell.strip() for cell in line.strip("| \n").split("|")]
|
||||
if len(cells) >= 3:
|
||||
entry = {
|
||||
"student": cells[0],
|
||||
"award": cells[1],
|
||||
"teacher": cells[2] if len(cells) > 2 else "",
|
||||
}
|
||||
if current_section:
|
||||
current_section["entries"].append(entry)
|
||||
elif "**" in line and ":" in line and current_section:
|
||||
# Parse teacher awards
|
||||
current_section["teacher_awards"].append(line.strip())
|
||||
|
||||
if current_section:
|
||||
sections.append(current_section)
|
||||
|
||||
return sections
|
||||
|
||||
|
||||
def generate_markdown_tables(sections: List[Dict]) -> str:
|
||||
"""Generate structured markdown with proper table formatting"""
|
||||
md_output = []
|
||||
|
||||
for section in sections:
|
||||
# Add section header
|
||||
md_output.append(f"#### **{section['title']}**")
|
||||
md_output.append("")
|
||||
|
||||
# Add table if there are entries
|
||||
if section["entries"]:
|
||||
md_output.append("|获奖学生|组别和奖项|指导老师|")
|
||||
md_output.append("|---|---|---|")
|
||||
for entry in section["entries"]:
|
||||
md_output.append(
|
||||
f"|{entry['student']}|{entry['award']}|{entry['teacher']}|"
|
||||
)
|
||||
md_output.append("")
|
||||
|
||||
# Add teacher awards
|
||||
for award in section["teacher_awards"]:
|
||||
md_output.append(award)
|
||||
md_output.append("")
|
||||
|
||||
return "\n".join(md_output)
|
||||
|
||||
|
||||
def generate_html_tables(
|
||||
sections: List[Dict],
|
||||
three_col_widths: str = "20,60,20",
|
||||
two_col_widths: str = "70,30",
|
||||
no_headers: bool = False,
|
||||
) -> str:
|
||||
"""Generate professional HTML with responsive tables"""
|
||||
|
||||
# Parse width percentages
|
||||
three_widths = [int(w) for w in three_col_widths.split(",")]
|
||||
two_widths = [int(w) for w in two_col_widths.split(",")]
|
||||
|
||||
html_parts = [
|
||||
"<!DOCTYPE html>",
|
||||
'<html lang="zh-CN">',
|
||||
"<head>",
|
||||
' <meta charset="UTF-8">',
|
||||
' <meta name="viewport" content="width=device-width, initial-scale=1.0">',
|
||||
" <title>Structured Tables</title>",
|
||||
" <style>",
|
||||
" body {",
|
||||
' font-family: "Microsoft YaHei", "SimHei", "Arial", sans-serif;',
|
||||
" margin: 20px;",
|
||||
" line-height: 1.6;",
|
||||
" }",
|
||||
" h4 {",
|
||||
" text-align: center;",
|
||||
" color: #333;",
|
||||
" margin: 30px 0 15px 0;",
|
||||
" font-size: 18px;",
|
||||
" border-bottom: 2px solid #333;",
|
||||
" padding-bottom: 8px;",
|
||||
" }",
|
||||
" .table-3col {",
|
||||
" width: 100%;",
|
||||
" border-collapse: collapse;",
|
||||
" margin: 10px 0 5px 0;",
|
||||
" table-layout: fixed;",
|
||||
" }",
|
||||
" .table-3col td {",
|
||||
" border: 1px solid #333;",
|
||||
" padding: 8px;",
|
||||
" text-align: center;",
|
||||
" vertical-align: middle;",
|
||||
" font-size: 14px;",
|
||||
" }",
|
||||
" .table-3col tr:nth-child(even) {",
|
||||
" background-color: #f9f9f9;",
|
||||
" }",
|
||||
f" .table-3col col:nth-child(1) {{ width: {three_widths[0]}%; }}",
|
||||
f" .table-3col col:nth-child(2) {{ width: {three_widths[1]}%; }}",
|
||||
f" .table-3col col:nth-child(3) {{ width: {three_widths[2]}%; }}",
|
||||
" .table-2col {",
|
||||
" width: 100%;",
|
||||
" border-collapse: collapse;",
|
||||
" margin: 10px 0 5px 0;",
|
||||
" table-layout: fixed;",
|
||||
" }",
|
||||
" .table-2col td {",
|
||||
" border: 1px solid #333;",
|
||||
" padding: 8px;",
|
||||
" text-align: center;",
|
||||
" vertical-align: middle;",
|
||||
" font-size: 14px;",
|
||||
" }",
|
||||
" .table-2col tr:nth-child(even) {",
|
||||
" background-color: #f9f9f9;",
|
||||
" }",
|
||||
f" .table-2col col:nth-child(1) {{ width: {two_widths[0]}%; }}",
|
||||
f" .table-2col col:nth-child(2) {{ width: {two_widths[1]}%; }}",
|
||||
" .teacher-award {",
|
||||
" font-size: 12px;",
|
||||
" margin: 0 0 10px 0;",
|
||||
" text-align: center;",
|
||||
" color: #666;",
|
||||
" }",
|
||||
" .subtitle {",
|
||||
" text-align: center;",
|
||||
" margin: 10px 0;",
|
||||
" font-weight: bold;",
|
||||
" color: #333;",
|
||||
" }",
|
||||
" </style>",
|
||||
"</head>",
|
||||
"<body>",
|
||||
]
|
||||
|
||||
for section in sections:
|
||||
html_parts.append(f"<h4>{section['title']}</h4>")
|
||||
|
||||
if section["entries"]:
|
||||
# Determine if it's a 3-col or 2-col table based on data
|
||||
is_three_col = any(entry.get("teacher") for entry in section["entries"])
|
||||
|
||||
if is_three_col:
|
||||
html_parts.append('<table class="table-3col">')
|
||||
html_parts.append(" <colgroup>")
|
||||
html_parts.append(" <col><col><col>")
|
||||
html_parts.append(" </colgroup>")
|
||||
html_parts.append(" <tbody>")
|
||||
for entry in section["entries"]:
|
||||
html_parts.append(
|
||||
f" <tr><td>{entry['student']}</td><td>{entry['award']}</td><td>{entry['teacher']}</td></tr>"
|
||||
)
|
||||
html_parts.append(" </tbody>")
|
||||
html_parts.append("</table>")
|
||||
else:
|
||||
html_parts.append('<table class="table-2col">')
|
||||
html_parts.append(" <colgroup>")
|
||||
html_parts.append(" <col><col>")
|
||||
html_parts.append(" </colgroup>")
|
||||
html_parts.append(" <tbody>")
|
||||
for entry in section["entries"]:
|
||||
html_parts.append(
|
||||
f" <tr><td>{entry['award']}</td><td>{entry['student']}</td></tr>"
|
||||
)
|
||||
html_parts.append(" </tbody>")
|
||||
html_parts.append("</table>")
|
||||
|
||||
for award in section["teacher_awards"]:
|
||||
html_parts.append(f'<p class="teacher-award">{award}</p>')
|
||||
|
||||
html_parts.extend(["</body>", "</html>"])
|
||||
return "\n".join(html_parts)
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Convert Word documents to structured tables"
|
||||
)
|
||||
parser.add_argument("input_docx", help="Input DOCX file path")
|
||||
parser.add_argument("output_prefix", help="Output file prefix")
|
||||
parser.add_argument(
|
||||
"--three-col-widths",
|
||||
default="20,60,20",
|
||||
help="Three-column table widths (default: 20,60,20)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--two-col-widths",
|
||||
default="70,30",
|
||||
help="Two-column table widths (default: 70,30)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--process-teacher-awards",
|
||||
action="store_true",
|
||||
help="Process and merge teacher awards",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--no-headers", action="store_true", help="Generate HTML without table headers"
|
||||
)
|
||||
parser.add_argument("--help", action="help", help="Show this help message")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Validate input file
|
||||
if not os.path.exists(args.input_docx):
|
||||
print(f"✗ Input file not found: {args.input_docx}")
|
||||
sys.exit(1)
|
||||
|
||||
# Create output directory if needed
|
||||
output_dir = os.path.dirname(args.output_prefix) or "."
|
||||
os.makedirs(output_dir, exist_ok=True)
|
||||
|
||||
# Step 1: Convert DOCX to Markdown
|
||||
temp_md = args.output_prefix + "_temp.md"
|
||||
convert_docx_to_markdown(args.input_docx, temp_md)
|
||||
|
||||
# Step 2: Read and parse Markdown
|
||||
with open(temp_md, "r", encoding="utf-8") as f:
|
||||
md_content = f.read()
|
||||
|
||||
sections = parse_markdown_content(md_content)
|
||||
|
||||
# Step 3: Generate structured Markdown
|
||||
structured_md = generate_markdown_tables(sections)
|
||||
md_output = args.output_prefix + "_md.md"
|
||||
with open(md_output, "w", encoding="utf-8") as f:
|
||||
f.write(structured_md)
|
||||
print(f"✓ Generated structured Markdown: {md_output}")
|
||||
|
||||
# Step 4: Generate HTML
|
||||
html_content = generate_html_tables(
|
||||
sections, args.three_col_widths, args.two_col_widths, args.no_headers
|
||||
)
|
||||
html_output = args.output_prefix + "_html.html"
|
||||
with open(html_output, "w", encoding="utf-8") as f:
|
||||
f.write(html_content)
|
||||
print(f"✓ Generated HTML tables: {html_output}")
|
||||
|
||||
# Clean up temp file
|
||||
os.remove(temp_md)
|
||||
print("✓ Process completed successfully!")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,23 @@
|
||||
{
|
||||
"name": "doc-to-tables",
|
||||
"version": "1.0.0",
|
||||
"description": "将Word文档转换为结构化Markdown和HTML表格的完整工作流技能",
|
||||
"category": "document-processing",
|
||||
"author": "小小莫",
|
||||
"tags": ["word", "markdown", "html", "tables", "data-extraction", "piano-competitions"],
|
||||
"entry_point": "scripts.doc_to_tables:main",
|
||||
"dependencies": {
|
||||
"required": ["pandoc"],
|
||||
"optional": ["fpdf2"]
|
||||
},
|
||||
"file_formats": {
|
||||
"input": ["docx"],
|
||||
"output": ["md", "html"]
|
||||
},
|
||||
"usage_examples": [
|
||||
"python doc_to_tables.py input.docx output",
|
||||
"python doc_to_tables.py input.docx output --three-col-widths 25,50,25"
|
||||
],
|
||||
"created_at": "2026-03-01",
|
||||
"updated_at": "2026-03-01"
|
||||
}
|
||||
@@ -0,0 +1,135 @@
|
||||
# DOC to Tables Skill
|
||||
|
||||
## 概述
|
||||
|
||||
将Word文档(.docx)转换为结构化Markdown表格,再生成专业HTML表格文件的完整工作流技能。适用于需要从非结构化文档中提取比赛获奖、考级成绩、荣誉证书等信息并转换为标准化表格格式的场景。
|
||||
|
||||
## 适用场景
|
||||
|
||||
- 钢琴/音乐比赛获奖名单整理
|
||||
- 考级成绩汇总
|
||||
- 教师荣誉奖项统计
|
||||
- 学生获奖情况整理
|
||||
- 年度汇总报告制作
|
||||
- 任何需要从Word文档提取结构化数据并生成表格的场景
|
||||
|
||||
## 输入要求
|
||||
|
||||
### 源文档格式
|
||||
- Word文档(.docx)包含获奖/成绩/荣誉信息
|
||||
- 文档中通常包含:
|
||||
- 比赛/考级名称作为标题
|
||||
- 学生姓名、奖项、指导教师等信息
|
||||
- 可能包含重复或不完整的指导教师信息
|
||||
|
||||
### 预期输出
|
||||
- 结构化的Markdown文件(带正确表格格式)
|
||||
- 专业的HTML文件(可直接用于海报制作)
|
||||
|
||||
## 工作流程
|
||||
|
||||
### 阶段1:文档分析与数据提取
|
||||
1. **DOCX转Markdown**:使用pandoc保留原始标题层级结构
|
||||
2. **数据模式识别**:分析文档中的信息模式(学生姓名、组别奖项、指导教师)
|
||||
3. **重复数据去重**:识别相同奖项的多个获奖者,合并教师姓名
|
||||
|
||||
### 阶段2:Markdown表格整理
|
||||
1. **统一表格结构**:
|
||||
- 三列表格:`|获奖学生|组别和奖项|指导老师|`
|
||||
- 两列表格:`|赛事/活动|奖项/荣誉|`
|
||||
2. **智能标题适配**:
|
||||
- 简单奖项使用"奖项"列标题
|
||||
- 复杂奖项使用"组别和奖项"列标题
|
||||
- 考级项目保持完整信息(性别、级别、分数、评级)
|
||||
3. **教师奖项优化**:
|
||||
- 相同奖项的教师合并到同一行
|
||||
- 使用逗号分隔多个教师姓名
|
||||
- 特殊奖项单独列出
|
||||
|
||||
### 阶段3:HTML表格生成
|
||||
1. **响应式设计**:表格宽度100%自适应
|
||||
2. **精确列宽控制**:
|
||||
- 三列表格:20% | 60% | 20%
|
||||
- 两列表格:70% | 30%
|
||||
3. **专业样式**:
|
||||
- 表格边框和交替背景色
|
||||
- 教师奖项使用缩小字体
|
||||
- 支持打印和PDF导出
|
||||
|
||||
## 使用方法
|
||||
|
||||
### 基本用法
|
||||
```bash
|
||||
# 将Word文档转换为Markdown和HTML表格
|
||||
python doc_to_tables.py "input.docx" "output"
|
||||
```
|
||||
|
||||
### 高级选项
|
||||
```bash
|
||||
# 自定义列宽比例
|
||||
python doc_to_tables.py "input.docx" "output" --three-col-widths "25,50,25" --two-col-widths "60,40"
|
||||
|
||||
# 包含教师奖项处理
|
||||
python doc_to_tables.py "input.docx" "output" --process-teacher-awards
|
||||
|
||||
# 生成无表头HTML(适合海报制作)
|
||||
python doc_to_tables.py "input.docx" "output" --no-headers
|
||||
```
|
||||
|
||||
## 输出文件
|
||||
|
||||
- `{output}_md.md` - 结构化Markdown文件
|
||||
- `{output}_html.html` - 专业HTML表格文件
|
||||
|
||||
## 技术特点
|
||||
|
||||
### 智能数据匹配
|
||||
- 跨比赛匹配学生与指导教师
|
||||
- 自动补全缺失的指导教师信息
|
||||
- 处理双人/多人项目的指导教师合并
|
||||
|
||||
### 格式优化
|
||||
- Markdown表格在Obsidian中完美显示
|
||||
- HTML表格适合Photoshop导入和海报制作
|
||||
- 支持中文字符和特殊符号
|
||||
|
||||
### 错误处理
|
||||
- 自动检测和修复格式问题
|
||||
- 缺失数据留空供后续补充
|
||||
- 保留原始数据完整性
|
||||
|
||||
## 示例场景
|
||||
|
||||
### 音乐比赛获奖整理
|
||||
**输入**:包含多个钢琴比赛获奖名单的Word文档
|
||||
**输出**:
|
||||
- Markdown表格按比赛分类,包含学生、奖项、指导教师
|
||||
- HTML表格可直接用于制作年度喜报海报
|
||||
|
||||
### 考级成绩汇总
|
||||
**输入**:英皇、音协等考级成绩Word文档
|
||||
**输出**:
|
||||
- 结构化表格包含学生姓名、性别、级别、分数、评级、指导教师
|
||||
- 专业HTML格式适合打印和展示
|
||||
|
||||
## 注意事项
|
||||
|
||||
1. **标题层级**:源文档应使用适当的标题层级(H1-H6)
|
||||
2. **数据一致性**:学生姓名和教师姓名应保持一致拼写
|
||||
3. **特殊字符**:支持中文、英文、数字和常见符号
|
||||
4. **空值处理**:缺失的指导教师信息会留空,方便后续补充
|
||||
|
||||
## 依赖项
|
||||
|
||||
- pandoc (DOCX转Markdown)
|
||||
- Python 3.6+
|
||||
- fpdf2 (可选,用于PDF生成)
|
||||
|
||||
## 扩展性
|
||||
|
||||
此技能可轻松扩展以支持:
|
||||
- Excel文件作为输入源
|
||||
- 其他文档格式(PDF、PPT)
|
||||
- 自定义表格样式和主题
|
||||
- 多语言支持
|
||||
- 自动化批量处理
|
||||
Reference in New Issue
Block a user