Initial commit: skills library

- 70 skills with code and documentation
- Add .gitignore (ignore __pycache__, output/, temp/, venv/)
- Clean up test intermediates and caches
This commit is contained in:
hmo
2026-04-26 19:27:40 +08:00
commit 04db423416
861 changed files with 210414 additions and 0 deletions
+160
View File
@@ -0,0 +1,160 @@
---
name: musicXML-ocr
description: 乐谱图片 OCR 技能。将五线谱图片(PNG/JPG/PDF)识别并转换为 MusicXML 格式。使用 Audiveris 进行光学音乐识别(OMR)。当用户提到"识别乐谱"、"图片转 MusicXML"、"打谱"、"手抄谱转数字"时触发此技能。
---
# MusicXML OCR Skill
## 功能概述
将五线谱图片(PNG/JPG)或 PDF 通过 OCR 识别转换为 MusicXML 格式,支持:
- 扫描/拍摄的手拍乐谱图片 → MusicXML
- PDF 中的乐谱页面 → MusicXML
- 音符、节拍、调号识别
- 钢琴谱 grand staff 结构
- 输出可在 MuseScore、Finale 等软件打开
## 触发场景
- "识别这张乐谱"
- "把图片转成 MusicXML"
- "帮我打谱"
- "手抄谱数字化"
- "识别五线谱"
---
## 推荐 Workflow
```
PDF → 图片提取 → Audiveris 识别 → MuseScore 合并修正
```
### Step 1: PDF → 图片
如果 PDF 无法被 Audiveris 直接处理,需要先提取图片:
```python
import fitz # PyMuPDF
pdf_path = r"input.pdf"
output_dir = r"temp\sheets"
os.makedirs(output_dir, exist_ok=True)
doc = fitz.open(pdf_path)
for page_num in range(len(doc)):
page = doc[page_num]
images = page.get_images()
for img_idx, img in enumerate(images):
xref = img[0]
pix = fitz.Pixmap(doc, xref)
if pix.n - pix.alpha < 4:
pix.save(f'{output_dir}/page{page_num+1}_img{img_idx+1}.png')
else:
pix1 = fitz.Pixmap(fitz.csRGB, pix)
pix1.save(f'{output_dir}/page{page_num+1}_img{img_idx+1}.png')
doc.close()
```
### Step 2: Audiveris 识别
```powershell
# 单张图片识别
& "C:\Program Files\Audiveris\Audiveris.exe" -batch -export -output "D:\output" "D:\input\sheet.png"
# PDF 直接处理(如果支持)
& "C:\Program Files\Audiveris\Audiveris.exe" -batch -export -output "D:\output" "D:\input\score.pdf"
```
**输出文件:**
- `.omr` - Audiveris 项目文件(可再次打开编辑)
- `.mxl` / `.xml` - MusicXML 格式
### Step 3: MuseScore 合并与修正
**推荐在 MuseScore 中手动处理:**
1. 打开第一页的 .mxl 文件
2. 菜单 → 文件 → 导入 → 选择第二页的 .xml
3. 或直接复制粘贴各部分
**在 MuseScore 中修正:**
- 删除错误识别的符号(选中 → Delete)
- 修正小节号(双击小节号直接编辑)
- 合并后的文件另存为 .mscz 或 .mxl
---
## 安装 Audiveris
**Windows**
```powershell
winget install --id Audiveris.Audiveris --accept-package-agreements --accept-source-agreements
```
**手动下载:**
- https://github.com/Audiveris/audiveris/releases
**路径:**
```
C:\Program Files\Audiveris\Audiveris.exe
```
---
## 方案对比
| 方案 | 准确率 | 可靠性 | 推荐度 |
|------|--------|--------|--------|
| **Audiveris → MuseScore 手动合并** | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ✅ **推荐** |
| **Audiveris 直接处理 PDF** | ⭐⭐⭐⭐ | ⭐⭐⭐ | 备选 |
| **homr** | ⭐⭐⭐ | ⭐⭐⭐ | ⚠️ 备选 |
**经验总结:**
- Audiveris 识别准确率高(第一小节测试完全正确)
- homr 存在 dewarping 问题,识别可能完全错误
- XML 合并不可靠,建议在 MuseScore 中手动合并
- 分页处理时,每页是独立的乐谱,小节号需要手动修正
---
## 脚本工具
| 脚本 | 功能 | 依赖 |
|------|------|------|
| `extract_pdf_images.py` | 从 PDF 提取五线谱图片 | `fitz` (PyMuPDF) |
| `audiveris_to_musescore.py` | Audiveris 完整流水线 | Audiveris 已安装 |
### 使用方法
```powershell
# 1. 提取 PDF 图片
python scripts/extract_pdf_images.py "D:\scores\sheet.pdf" "D:\output\sheets"
# 2. Audiveris 识别
& "C:\Program Files\Audiveris\Audiveris.exe" -batch -export -output "D:\output" "D:\output\sheets\page1.png"
# 3. 在 MuseScore 中合并修正
# 打开 .omr 文件,手动修正后导出
```
---
## 验证清单
处理完成后,检查 MuseScore 中的结果:
- [ ] 音符位置与原图对应
- [ ] 小节号连续正确
- [ ] 钢琴谱上下两行用花括号连接
- [ ] 无多余/重复的符号
---
## 故障排除
| 问题 | 解决方案 |
|------|---------|
| Audiveris 无法处理 PDF | 先用 `extract_pdf_images.py` 提取图片 |
| 识别有多余符号 | 在 Audiveris 中选中删除,或在 MuseScore 中修正 |
| 小节号不连续 | 在 MuseScore 中手动修正小节号 |
| XML 合并出错 | 在 MuseScore 中手动复制粘贴合并 |
| homr 识别全错 | 改用 Audiveris |
View File
+3
View File
@@ -0,0 +1,3 @@
# Example Reference
This is an example reference file. Delete if not needed.
+88
View File
@@ -0,0 +1,88 @@
# homr 使用指南
## 安装状态
✅ 当前环境已预装 `homr``verovio`
首次运行会自动下载检测模型(约 500MB),请耐心等待。
## 基本用法
### 命令行
```bash
# 基本识别
homr path/to/image.png --output result.musicxml
# 批量处理目录
homr path/to/folder/
# 指定输出文件名
homr image.png -o my_song.musicxml
```
### Python API
```python
from homr.main import process_image, ProcessingConfig, XmlGeneratorArguments
# 注意:homr 对中文路径支持有问题,建议先用 ASCII 路径
image_path = r"path/to/sheet_music.png"
config = ProcessingConfig(False, False, False, False, -1)
xml_args = XmlGeneratorArguments(False, None, None)
result = process_image(image_path, config, xml_args)
print("识别完成!")
# 输出文件自动生成:同目录下 .musicxml 文件
```
## 参数说明
| 参数 | 类型 | 说明 |
|------|------|------|
| `model_type` | str | 模型类型:`ctc`(默认)或 `transformer` |
| `language` | str | 语言:`english``german` 等 |
| `staffline_height` | float | 五线间距(像素),默认自动检测 |
## 输出格式
输出的 MusicXML 包含:
- `<part>` - 乐器/声部
- `<measure>` - 小节
- `<note>` - 音符
- `<attributes>` - 调号、拍号
- `<direction>` - 力度、表情记号
## 错误处理
```python
from homr.main import Homr
try:
model = Homr()
result = model.predict("image.png")
except Exception as e:
print(f"识别失败: {e}")
```
## 已知限制
1. 手写体识别效果较差
2. 复杂和声识别可能不准确
3. 装饰音识别有限
4. 首次运行需下载模型(约 500MB)
5. **中文路径问题**:homr 对中文路径支持有问题,建议使用英文/数字路径
## 已知 Bug 修复
如果遇到 `numpy` 兼容性错误,修改 `autocrop.py`
```python
# 原来的:
hist = cv2.calcHist([img], [0], None, [256], [0, 256])
dominant_color_gray_scale = max(enumerate(hist), ...)[0]
# 修复为:
hist = cv2.calcHist([img], [0], None, [256], [0, 256]).flatten()
dominant_color_gray_scale = max(enumerate(hist), ...)[0]
```
+102
View File
@@ -0,0 +1,102 @@
# 提高识别率技巧
## 图片质量要求
### ✅ 最佳
- 300+ DPI 扫描件
- 光线均匀、无阴影
- 纸张平整无褶皱
- 音符清晰、线条完整
### ⚠️ 次佳
- 150 DPI 扫描
- 手机拍摄但光线好
- 轻微阴影可接受
### ❌ 避免
- 50 DPI 以下的模糊图
- 严重倾斜(可预处理校正)
- 光线不均导致部分过暗/过亮
- 铅笔痕迹干扰
## 预处理建议
### Python + PIL 增强
```python
from PIL import Image, ImageFilter, ImageOps
import numpy as np
def preprocess_sheet_music(image_path):
img = Image.open(image_path).convert('L') # 灰度
# 增加对比度
img = ImageOps.autocontrast(img)
# 去噪
img = img.filter(ImageFilter.MedianFilter(size=3))
# 二值化(有时有效)
# img = img.point(lambda x: 0 if x < 128 else 255)
return img
# 使用
processed = preprocess_sheet_music("original.jpg")
processed.save("processed.png")
```
### 倾斜校正
```python
from deskew import deskew
# 自动检测并校正倾斜角度
image = Image.open("tilted.jpg")
corrected = deskew(image)
corrected.save("corrected.jpg")
```
## 格式选择
| 格式 | 推荐度 | 说明 |
|------|--------|------|
| PNG | ⭐⭐⭐⭐⭐ | 无损压缩,最佳 |
| TIFF | ⭐⭐⭐⭐ | 印刷级无损 |
| JPG (高画质) | ⭐⭐⭐ | 可接受 |
| JPG (低画质) | ⭐ | 压缩失真严重 |
## 分辨率建议
| 乐谱复杂度 | 最小宽度 | 建议 DPI |
|-----------|---------|---------|
| 简单旋律 | 1000px | 150 |
| 钢琴谱(双手) | 2500px | 300 |
| 交响乐总谱 | 4000px | 400 |
## 多页处理
```python
import subprocess
# 将多页 PDF 转为单页图片
subprocess.run([
"pdftoppm",
"-r", "300", # DPI
"-png", # 输出 PNG
"score.pdf", # 输入
"page" # 输出前缀
])
# 然后逐页识别
for i in range(1, page_count + 1):
model.predict(f"page-{i}.png")
```
## 识别后校对
建议在 MuseScore 中打开识别结果,检查:
1. 节拍是否正确
2. 升降号是否遗漏
3. 和弦分解是否正确
4. 休止符位置
@@ -0,0 +1,70 @@
# Verovio 渲染预览
## 安装
```bash
pip install verovio
```
## 渲染 MusicXML 为图片
```python
import verovio
toolkit = verovio.toolkit()
# 加载 MusicXML
toolkit.loadFile("score.musicxml")
# 渲染为 SVG
svg = toolkit.renderToSVG()
# 保存 SVG
with open("output.svg", "w") as f:
f.write(svg)
# 渲染为 PNG
toolkit.renderToPNG("output.png", 300) # 300 DPI
# 渲染为 PDF
toolkit.renderToPDF("output.pdf")
```
## 命令行工具
```bash
# 安装命令行工具
pip install verovio[commandline]
# 转换
vrv tool -o output.png input.musicxml -r 300
```
## 在 Jupyter 中预览
```python
from IPython.display import SVG, display
toolkit = verovio.toolkit()
toolkit.loadFile("score.musicxml")
svg = toolkit.renderToSVG()
display(SVG(svg))
```
## 调整页面布局
```python
toolkit.setOption("pageWidth", 2100) # 页面宽度(tenths
toolkit.setOption("pageHeight", 2970) # 页面高度
toolkit.setOption("scale", 50) # 缩放比例
toolkit.setOption("spacingSystem", 50) # 五线间距
```
## 获取 MIDI
```python
toolkit.loadFile("score.musicxml")
midi = toolkit.renderToMIDI()
with open("output.mid", "wb") as f:
f.write(midi)
```
+3
View File
@@ -0,0 +1,3 @@
# musicXML-ocr - dependencies
homr>=0.0.1
xml.etree.ElementTree>=0.0.1
+48
View File
@@ -0,0 +1,48 @@
#!/usr/bin/env python3
"""为 MusicXML 钢琴谱添加花括号分组。"""
import sys
import xml.etree.ElementTree as ET
if len(sys.argv) < 2:
print('Usage: python add_brace.py input.musicxml [output.musicxml]')
sys.exit(1)
input_file = sys.argv[1]
output_file = sys.argv[2] if len(sys.argv) > 2 else input_file.replace('.musicxml', '_braced.musicxml')
ET.register_namespace('m', 'http://www.musescore.org/ns/mscore')
ET.register_namespace('xlink', 'http://www.w3.org/1999/xlink')
tree = ET.parse(input_file)
root = tree.getroot()
part_list = root.find('part-list')
if part_list is None:
print('Error: no <part-list> found in file')
sys.exit(1)
# 添加花括号分组(插在 part-list 最前面)
group_start = ET.Element('part-group')
group_start.set('number', '1')
group_start.set('type', 'start')
ET.SubElement(group_start, 'group-symbol').text = 'brace'
ET.SubElement(group_start, 'group-barline').text = 'bracket'
ET.SubElement(group_start, 'group-time')
part_list.insert(0, group_start)
# 每个 score-part 加 <group>1</group>
for sp in part_list.findall('score-part'):
g = ET.SubElement(sp, 'group')
g.text = '1'
# 结束分组
part_list.append(ET.Element('part-group', number='1', type='stop'))
try:
ET.indent(root)
except AttributeError:
pass
tree.write(output_file, encoding='UTF-8', xml_declaration=True)
print(f'Added brace grouping')
print(f'Output: {output_file}')
@@ -0,0 +1,202 @@
#!/usr/bin/env python3
"""
audiveris_to_musescore.py
使用 Audiveris 进行五线谱 OCR 识别的完整流水线。
功能:
1. 调用 Audiveris 识别五线谱图片/PDF
2. 输出 .mxl 格式的 MusicXML 文件
Usage:
python audiveris_to_musescore.py <input_image_or_pdf> [output_dir]
Examples:
python audiveris_to_musescore.py "D:/scores/sheet.png" "D:/output"
# 处理 PDF
python audiveris_to_musescore.py "D:/scores/piece.pdf" "D:/output"
Dependencies:
- Audiveris (https://github.com/Audiveris/audiveris)
- Python 3.8+
Installation:
# Windows
winget install --id Audiveris.Audiveris --accept-package-agreements --accept-source-agreements
# 或手动下载 MSI: https://github.com/Audiveris/audiveris/releases
"""
import sys
import os
import subprocess
import tempfile
import shutil
import zipfile
import argparse
# Audiveris 安装路径
AUDIVERIS_PATH = r"C:\Program Files\Audiveris\Audiveris.exe"
def find_audiveris() -> str:
"""查找 Audiveris 可执行文件路径。"""
if os.path.exists(AUDIVERIS_PATH):
return AUDIVERIS_PATH
# 尝试常见安装位置
possible_paths = [
r"C:\Program Files\Audiveris\Audiveris.exe",
r"C:\Program Files (x86)\Audiveris\Audiveris.exe",
os.path.expanduser(r"~\AppData\Local\Programs\Audiveris\Audiveris.exe"),
]
for path in possible_paths:
if os.path.exists(path):
return path
raise FileNotFoundError(
"Audiveris not found. Please install from:\n"
" winget install --id Audiveris.Audiveris\n"
" or download from: https://github.com/Audiveris/audiveris/releases"
)
def extract_musicxml_from_mxl(mxl_path: str, output_dir: str) -> str:
"""从 .mxl 文件中提取 MusicXML 内容。"""
with zipfile.ZipFile(mxl_path, 'r') as z:
for name in z.namelist():
if name.endswith('.xml'):
xml_path = os.path.join(output_dir, os.path.basename(name))
with z.open(name) as src:
content = src.read()
with open(xml_path, 'wb') as dst:
dst.write(content)
return xml_path
return None
def process_with_audiveris(input_path: str, output_dir: str) -> dict:
"""
使用 Audiveris 处理输入文件。
Returns:
dict with keys:
- mxl_path: path to .mxl output
- xml_path: path to extracted .xml (if available)
- omr_path: path to .omr project file
"""
if not os.path.exists(input_path):
raise FileNotFoundError(f"Input file not found: {input_path}")
os.makedirs(output_dir, exist_ok=True)
audiveris_exe = find_audiveris()
print(f"[Audiveris] Found at: {audiveris_exe}")
# 构建命令:-batch -export -output <dir> <input>
cmd = [
audiveris_exe,
'-batch',
'-export',
'-output', output_dir,
input_path
]
print(f"[Audiveris] Processing: {input_path}")
print(f"[Audiveris] Command: {' '.join(cmd)}")
# 运行 Audiveris
result = subprocess.run(
cmd,
capture_output=True,
text=True,
encoding='utf-8',
errors='replace'
)
if result.returncode != 0:
print(f"[Audiveris] STDERR:\n{result.stderr}")
raise RuntimeError(f"Audiveris failed with code {result.returncode}")
print(f"[Audiveris] Processing complete!")
print(f"[Audiveris] Output directory: {output_dir}")
# 查找生成的输出文件
base_name = os.path.splitext(os.path.basename(input_path))[0]
# 可能的输出文件
mxl_path = os.path.join(output_dir, f"{base_name}.mxl")
omr_path = os.path.join(output_dir, f"{base_name}.omr")
# 列出输出目录内容
print(f"[Audiveris] Files in output directory:")
for f in os.listdir(output_dir):
print(f" - {f}")
return {
'mxl_path': mxl_path if os.path.exists(mxl_path) else None,
'xml_path': None, # 稍后提取
'omr_path': omr_path if os.path.exists(omr_path) else None,
'output_dir': output_dir,
'base_name': base_name
}
def main():
parser = argparse.ArgumentParser(
description="OMR using Audiveris - Convert sheet music images to MusicXML",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
python audiveris_to_musescore.py "D:\\scores\\sheet.png" "D:\\output"
python audiveris_to_musescore.py "D:\\scores\\piece.pdf"
"""
)
parser.add_argument('input', help="Input image (PNG/JPG) or PDF file")
parser.add_argument('output_dir', nargs='?', default=None,
help="Output directory (default: same as input)")
args = parser.parse_args()
input_path = os.path.abspath(args.input)
if args.output_dir:
output_dir = os.path.abspath(args.output_dir)
else:
# 默认输出到输入文件同目录
output_dir = os.path.join(os.path.dirname(input_path), 'audiveris_output')
try:
result = process_with_audiveris(input_path, output_dir)
print()
print("=" * 50)
print("Processing complete!")
print("=" * 50)
if result['mxl_path']:
print(f"MusicXML (.mxl): {result['mxl_path']}")
# 提取 XML 以便查看
xml_path = extract_musicxml_from_mxl(
result['mxl_path'],
output_dir
)
if xml_path:
print(f"MusicXML (.xml): {xml_path}")
if result['omr_path']:
print(f"Project (.omr): {result['omr_path']}")
print()
print("Open .mxl file in MuseScore to verify and edit.")
except Exception as e:
print(f"Error: {e}")
sys.exit(1)
if __name__ == '__main__':
main()
+4
View File
@@ -0,0 +1,4 @@
#!/usr/bin/env python3
"""Example script - delete if not needed."""
print("Hello from skill!")
+133
View File
@@ -0,0 +1,133 @@
#!/usr/bin/env python3
"""
extract_pdf_images.py
从 PDF 中提取五线谱图片,用于后续 OMR 处理。
功能:
1. 打开 PDF 文件
2. 提取每页中的图片
3. 保存为 PNG 文件
Usage:
python extract_pdf_images.py <pdf_path> [output_dir]
Examples:
python extract_pdf_images.py "D:/scores/sheet.pdf"
# 提取到指定目录
python extract_pdf_images.py "D:/scores/sheet.pdf" "D:/output/sheets"
Dependencies:
pip install pymupdf
"""
import os
import sys
import argparse
try:
import fitz # PyMuPDF
except ImportError:
print("Error: PyMuPDF not installed.")
print("Install with: pip install pymupdf")
sys.exit(1)
def extract_images_from_pdf(pdf_path: str, output_dir: str = None) -> list:
"""
从 PDF 中提取所有图片。
Args:
pdf_path: PDF 文件路径
output_dir: 输出目录,默认 temp/pdf_sheets
Returns:
提取的图片路径列表
"""
if not os.path.exists(pdf_path):
raise FileNotFoundError(f"PDF not found: {pdf_path}")
if output_dir is None:
output_dir = os.path.join(os.path.dirname(pdf_path), 'temp', 'pdf_sheets')
os.makedirs(output_dir, exist_ok=True)
doc = fitz.open(pdf_path)
print(f"[PDF] Opened: {pdf_path}")
print(f"[PDF] Pages: {len(doc)}")
extracted = []
for page_num in range(len(doc)):
page = doc[page_num]
images = page.get_images()
print(f"[PDF] Page {page_num + 1}: {len(images)} image(s)")
for img_idx, img in enumerate(images):
xref = img[0]
pix = fitz.Pixmap(doc, xref)
# 处理颜色模式
if pix.n - pix.alpha < 4:
# RGB 或灰度
out_path = os.path.join(
output_dir,
f"page{page_num+1:03d}_img{img_idx+1:02d}.png"
)
pix.save(out_path)
else:
# CMYK,转换为 RGB
pix1 = fitz.Pixmap(fitz.csRGB, pix)
out_path = os.path.join(
output_dir,
f"page{page_num+1:03d}_img{img_idx+1:02d}.png"
)
pix1.save(out_path)
print(f" -> {out_path}")
extracted.append(out_path)
doc.close()
return extracted
def main():
parser = argparse.ArgumentParser(
description="Extract images from PDF for OMR processing",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
python extract_pdf_images.py "D:\\scores\\sheet.pdf"
python extract_pdf_images.py "D:\\scores\\sheet.pdf" "D:\\output\\sheets"
"""
)
parser.add_argument('pdf', help="Input PDF file")
parser.add_argument('output_dir', nargs='?', default=None,
help="Output directory (default: temp/pdf_sheets in PDF dir)")
args = parser.parse_args()
pdf_path = os.path.abspath(args.pdf)
output_dir = os.path.abspath(args.output_dir) if args.output_dir else None
try:
images = extract_images_from_pdf(pdf_path, output_dir)
print()
print("=" * 50)
print(f"Extracted {len(images)} image(s)")
print("=" * 50)
if images:
print("\nNext steps:")
print(f" 1. Audiveris: & 'C:\\Program Files\\Audiveris\\Audiveris.exe' -batch -export -output <dir> {' '.join(images)}")
print(f" 2. Or use the audiveris_to_musescore.py script for each image")
except Exception as e:
print(f"Error: {e}")
sys.exit(1)
if __name__ == '__main__':
main()
+334
View File
@@ -0,0 +1,334 @@
#!/usr/bin/env python3
"""
grand_staff_merge.py
Post-process homr's dual-part MusicXML output into a proper piano grand staff.
What it does:
1. Merges P1 (treble/G clef) and P2 (bass/F clef) into a single Part
2. Adds <staves>2</staves> and <bracket type="brace"/> for piano grand staff
3. Bass stays on staff=1/voice=1, treble moves to staff=2/voice=2
4. Removes ALL <print> elements (MuseScore ignores them and causes extra systems)
5. Preserves note durations, pitches, dynamics — only structural changes
Usage:
python grand_staff_merge.py input.musicxml [output.musicxml]
"""
import sys
import xml.etree.ElementTree as ET
# Register namespaces to preserve them in output
ET.register_namespace("m", "http://www.musescore.org/ns/mscore")
ET.register_namespace("xlink", "http://www.w3.org/1999/xlink")
ET.register_namespace(
"mx", "http://schemas.openxmlformats.org/markup-compatibility/2006"
)
def remove_print_elements(part_elem):
"""Remove all <print> elements from a part (they cause extra system breaks in MuseScore)."""
removed = 0
for print_elem in part_elem.findall(".//print"):
for parent in part_elem.iter():
if print_elem in list(parent):
parent.remove(print_elem)
removed += 1
break
return removed
def make_attributes_grand_staff(attrs_elem):
"""Modify attributes element for grand staff: add staves, bracket, correct clefs."""
# Remove existing clef (will add both below)
existing_clef = attrs_elem.find("clef")
if existing_clef is not None:
attrs_elem.remove(existing_clef)
# Add <staves>2</staves> after <divisions>
divisions = attrs_elem.find("divisions")
staves_elem = ET.Element("staves")
staves_elem.text = "2"
if divisions is not None:
idx = list(attrs_elem).index(divisions) + 1
attrs_elem.insert(idx, staves_elem)
else:
attrs_elem.append(staves_elem)
# MusicXML clef order = visual top-to-bottom order
# Grand staff: G clef (treble) is visually at the TOP, so it comes FIRST in XML
# F clef (bass) is visually at the BOTTOM, so it comes SECOND
#
# CRITICAL: Staff numbering in MuseScore grand staff:
# staff=1 = upper staff = treble (G clef)
# staff=2 = lower staff = bass (F clef)
treble_clef = ET.SubElement(attrs_elem, "clef")
treble_sign = ET.SubElement(treble_clef, "sign")
treble_sign.text = "G"
treble_line = ET.SubElement(treble_clef, "line")
treble_line.text = "2"
treble_staff_elem = ET.SubElement(treble_clef, "staff")
treble_staff_elem.text = "1" # Upper staff = staff=1
bass_clef = ET.SubElement(attrs_elem, "clef")
bass_sign = ET.SubElement(bass_clef, "sign")
bass_sign.text = "F"
bass_line = ET.SubElement(bass_clef, "line")
bass_line.text = "4"
bass_staff_elem = ET.SubElement(bass_clef, "staff")
bass_staff_elem.text = "2" # Lower staff = staff=2
# Add bracket for piano brace (at end of attributes, after clefs)
bracket_elem = ET.SubElement(attrs_elem, "bracket")
bracket_elem.set("type", "brace")
bracket_elem.set("number", "1")
bracket_elem.text = ""
def make_attributes_no_print(attrs_elem):
"""Keep existing attributes but ensure no print element sibling."""
pass # print elements handled at part level
def process_part_measure(measure_elem, voice_map):
"""
Process a measure from P1 or P2.
voice_map: {'staff': 1 or 2, 'voice': 1 or 2}
"""
# Remove <print> elements from this measure
print_elem = measure_elem.find("print")
if print_elem is not None:
measure_elem.remove(print_elem)
for note in measure_elem.findall("note"):
# Update staff attribute
staff_elem = note.find("staff")
if staff_elem is not None:
staff_elem.text = str(voice_map["staff"])
# Update voice attribute
voice_elem = note.find("voice")
if voice_elem is not None:
voice_elem.text = str(voice_map["voice"])
def merge_parts_to_grand_staff(xml_path, output_path=None):
"""Main function: merge homr dual-part output into grand staff."""
if output_path is None:
output_path = xml_path.replace(".musicxml", "_grandstaff.musicxml")
tree = ET.parse(xml_path)
root = tree.getroot()
# Find part-list and part elements
part_list = root.find("part-list")
parts = root.findall("part")
if len(parts) < 2:
print(f"Warning: Expected 2 parts, found {len(parts)}. Nothing to merge.")
return
p1 = parts[0] # treble (G clef)
p2 = parts[1] # bass (F clef)
# Determine which is which by clef sign
def get_clef(part):
for m in part.findall("measure"):
clef = m.find(".//clef")
if clef is not None:
sign = clef.find("sign")
if sign is not None:
return sign.text
return None
p1_clef = get_clef(p1)
p2_clef = get_clef(p2)
print(f"P1 clef: {p1_clef}, P2 clef: {p2_clef}")
# Identify treble and bass parts
if p1_clef == "G" and p2_clef == "F":
treble_part = p1
bass_part = p2
print("Detected: P1=treble, P2=bass")
elif p1_clef == "F" and p2_clef == "G":
treble_part = p2
bass_part = p1
print("Detected: P1=bass, P2=treble")
else:
# Default: P1=treble, P2=bass (homr's typical output)
treble_part = p1
bass_part = p2
print(f"Could not determine clefs definitively, defaulting: P1=treble, P2=bass")
# === Step 1: Update part-list to single Part ===
# Replace with single score-part
new_part_list = ET.Element("part-list")
score_part = ET.SubElement(new_part_list, "score-part")
score_part.set("id", "P1")
part_name = ET.SubElement(score_part, "part-name")
part_name.text = "Piano"
part_name.set("print-object", "yes")
# Add midi-instrument
midi_inst = ET.SubElement(score_part, "midi-instrument")
midi_inst.set("id", "P1-I1")
midi_ch = ET.SubElement(midi_inst, "midi-channel")
midi_ch.text = "1"
midi_prog = ET.SubElement(midi_inst, "midi-program")
midi_prog.text = "1"
vol = ET.SubElement(midi_inst, "volume")
vol.text = "100"
pan = ET.SubElement(midi_inst, "pan")
pan.text = "0"
# Replace old part-list
if part_list is not None:
root.remove(part_list)
root.insert(list(root).index(p1), new_part_list)
# === Step 2: Update bass_part measure 1 attributes ===
measures_bass = bass_part.findall("measure")
if measures_bass:
first_measure_bass = measures_bass[0]
attrs = first_measure_bass.find("attributes")
if attrs is not None:
make_attributes_grand_staff(attrs)
# Remove <print> from ALL measures in both parts
for p in [bass_part, treble_part]:
for m in p.findall("measure"):
print_e = m.find("print")
if print_e is not None:
m.remove(print_e)
# === Step 3: Change treble notes to staff=1 (upper), voice=1 ===
for m in treble_part.findall("measure"):
process_part_measure(m, {"staff": 1, "voice": 1})
# === Step 4: Change bass notes to staff=2 (lower), voice=2 ===
for m in bass_part.findall("measure"):
process_part_measure(m, {"staff": 2, "voice": 2})
# === Step 5: Interleave measures from both parts ===
# Create a new combined part
combined = ET.Element("part")
combined.set("id", "P1")
bass_measures = bass_part.findall("measure")
treble_measures = treble_part.findall("measure")
# Also remove print from treble first measure if it wasn't already removed
if treble_measures:
print_e = treble_measures[0].find("print")
if print_e is not None:
treble_measures[0].remove(print_e)
# Add first measure of treble (with attributes) to combined first
# But we already added grand staff attributes to bass first measure
# So treble first measure's attributes should be removed
treble_attrs_removed = 0
if treble_measures:
treble_first_attrs = treble_measures[0].find("attributes")
if treble_first_attrs is not None:
treble_measures[0].remove(treble_first_attrs)
treble_attrs_removed += 1
# For measures that appear in both, interleave notes
# Strategy: for each measure number, combine notes from both parts
bass_by_num = {m.get("number"): m for m in bass_measures}
treble_by_num = {m.get("number"): m for m in treble_measures}
all_measure_nums = sorted(
set(list(bass_by_num.keys()) + list(treble_by_num.keys())),
key=lambda x: int(x) if x is not None and x.isdigit() else 0,
)
for mnum in all_measure_nums:
bass_m = bass_by_num.get(mnum)
treble_m = treble_by_num.get(mnum)
# Create new combined measure
new_measure = ET.Element("measure")
if mnum is not None:
new_measure.set("number", mnum)
# Copy attributes from bass (which has grand staff setup)
if bass_m is not None:
attrs = bass_m.find("attributes")
if attrs is not None:
new_measure.append(ET.fromstring(ET.tostring(attrs)))
# Copy direction/direction-type from treble first measure (for tempo, etc.)
if treble_m is not None and mnum == "1":
for child in list(treble_m):
if child.tag not in (
"note",
"backup",
"forward",
"attributes",
"barline",
"print",
):
new_measure.append(ET.fromstring(ET.tostring(child)))
# Add notes from treble (staff=1, upper) — first in XML
if treble_m is not None:
for note in treble_m.findall("note"):
new_measure.append(ET.fromstring(ET.tostring(note)))
for backup in treble_m.findall("backup"):
new_measure.append(ET.fromstring(ET.tostring(backup)))
for fwd in treble_m.findall("forward"):
new_measure.append(ET.fromstring(ET.tostring(fwd)))
# Add notes from bass (staff=2, lower) — after treble
if bass_m is not None:
for note in bass_m.findall("note"):
new_measure.append(ET.fromstring(ET.tostring(note)))
for backup in bass_m.findall("backup"):
new_measure.append(ET.fromstring(ET.tostring(backup)))
for fwd in bass_m.findall("forward"):
new_measure.append(ET.fromstring(ET.tostring(fwd)))
for barline in bass_m.findall("barline"):
new_measure.append(ET.fromstring(ET.tostring(barline)))
combined.append(new_measure)
# === Step 6: Replace original parts with combined ===
root.remove(p1)
if p2 != p1:
try:
root.remove(p2)
except ValueError:
pass
root.append(combined)
# === Step 7: Write output ===
# ET.indent for pretty printing (Python 3.9+)
try:
ET.indent(root)
except AttributeError:
pass # Python < 3.9
tree.write(output_path, encoding="UTF-8", xml_declaration=True)
print(f"\nGrand staff conversion complete!")
print(f" Input: {xml_path}")
print(f" Output: {output_path}")
print(f" Removed <print> elements to prevent extra system breaks")
print(f" Added: <staves>2</staves>, bracket type=brace")
print(f" Treble: staff=1 (upper), voice=1 | Bass: staff=2 (lower), voice=2")
print(f" Measures: {len(all_measure_nums)}")
return output_path
if __name__ == "__main__":
if len(sys.argv) < 2:
print(__doc__)
sys.exit(1)
input_file = sys.argv[1]
output_file = sys.argv[2] if len(sys.argv) > 2 else None
merge_parts_to_grand_staff(input_file, output_file)
+183
View File
@@ -0,0 +1,183 @@
#!/usr/bin/env python3
"""
homr_to_musescore.py
完整的 homr → MusicXML → MuseScore 可用格式 流水线。
功能:
1. 调用 homr 识别五线谱图片
2. 删除 <print> 元素(修复错误分行)
3. 添加花括号分组(钢琴谱 grand staff
Usage:
python homr_to_musescore.py image.png [output.musicxml]
Dependencies:
- homr (pip install homr)
- Python 3.8+ (标准库 xml.etree.ElementTree)
Environment:
依赖 base conda 环境(已包含 homr):
conda activate base
"""
import sys
import os
import tempfile
import shutil
import xml.etree.ElementTree as ET
# homr 依赖检查
try:
from homr.main import process_image, ProcessingConfig, XmlGeneratorArguments
HAS_HOMR = True
except ImportError:
HAS_HOMR = False
def step1_homr(image_path: str) -> str:
"""Step 1: 用 homr 识别图片,输出 MusicXML 路径。"""
print(f"[Step 1] Running homr OCR on: {image_path}")
if not HAS_HOMR:
print("Error: homr not installed. Run: pip install homr")
sys.exit(1)
# homr 不支持中文路径,用临时文件
tmp_dir = tempfile.mkdtemp(prefix='homr_')
tmp_image = os.path.join(tmp_dir, os.path.basename(image_path))
shutil.copy(image_path, tmp_image)
try:
config = ProcessingConfig(False, False, False, False, -1)
xml_args = XmlGeneratorArguments(False, None, None)
result = process_image(tmp_image, config, xml_args)
musicxml_path = result
print(f"[Step 1] Done: {musicxml_path}")
return musicxml_path
finally:
shutil.rmtree(tmp_dir, ignore_errors=True)
def step2_remove_print(input_path: str, output_path: str) -> None:
"""Step 2: 删除所有 <print> 元素。"""
print(f"[Step 2] Removing <print> elements from: {input_path}")
ET.register_namespace('m', 'http://www.musescore.org/ns/mscore')
ET.register_namespace('xlink', 'http://www.w3.org/1999/xlink')
tree = ET.parse(input_path)
root = tree.getroot()
removed_count = 0
for elem in root.iter():
to_remove = [c for c in list(elem) if c.tag == 'print']
for c in to_remove:
elem.remove(c)
removed_count += 1
try:
ET.indent(root)
except AttributeError:
pass
tree.write(output_path, encoding='UTF-8', xml_declaration=True)
print(f"[Step 2] Removed {removed_count} <print> element(s)")
def step3_add_brace(input_path: str, output_path: str) -> None:
"""Step 3: 添加花括号分组。"""
print(f"[Step 3] Adding brace grouping to: {input_path}")
ET.register_namespace('m', 'http://www.musescore.org/ns/mscore')
ET.register_namespace('xlink', 'http://www.w3.org/1999/xlink')
tree = ET.parse(input_path)
root = tree.getroot()
part_list = root.find('part-list')
if part_list is None:
print("[Step 3] Warning: no <part-list> found, skipping brace")
shutil.copy(input_path, output_path)
return
group_start = ET.Element('part-group')
group_start.set('number', '1')
group_start.set('type', 'start')
ET.SubElement(group_start, 'group-symbol').text = 'brace'
ET.SubElement(group_start, 'group-barline').text = 'bracket'
ET.SubElement(group_start, 'group-time')
part_list.insert(0, group_start)
for sp in part_list.findall('score-part'):
g = ET.SubElement(sp, 'group')
g.text = '1'
part_list.append(ET.Element('part-group', number='1', type='stop'))
try:
ET.indent(root)
except AttributeError:
pass
tree.write(output_path, encoding='UTF-8', xml_declaration=True)
print(f"[Step 3] Done")
def main():
if len(sys.argv) < 2:
print(__doc__)
sys.exit(1)
image_path = sys.argv[1]
if not os.path.exists(image_path):
print(f"Error: file not found: {image_path}")
sys.exit(1)
# 确定输出文件名
if len(sys.argv) > 2:
output_path = sys.argv[2]
else:
base = os.path.splitext(os.path.basename(image_path))[0]
output_path = os.path.join(os.path.dirname(image_path), f"{base}_final.musicxml")
# 创建临时目录存放中间文件
tmp_dir = tempfile.mkdtemp(prefix='homr_pipeline_')
try:
# Step 1: homr 识别
raw_xml = os.path.join(tmp_dir, 'raw.musicxml')
step1_homr(image_path)
# homr 会在图片同目录生成 .musicxml 文件
expected = image_path.replace(os.path.splitext(image_path)[1], '.musicxml')
if os.path.exists(expected):
raw_xml = expected
else:
# 尝试从 tmp_dir 找
candidates = [f for f in os.listdir(tmp_dir) if f.endswith('.musicxml')]
if candidates:
raw_xml = os.path.join(tmp_dir, candidates[0])
if not os.path.exists(raw_xml):
print(f"Error: homr did not produce output file")
sys.exit(1)
# Step 2: 删除 print
no_print_xml = os.path.join(tmp_dir, 'no_print.musicxml')
step2_remove_print(raw_xml, no_print_xml)
# Step 3: 添加花括号
step3_add_brace(no_print_xml, output_path)
print()
print(f"Pipeline complete!")
print(f"Final output: {output_path}")
print(f"Open in MuseScore to verify: 5 measures per line, brace on grand staff")
finally:
shutil.rmtree(tmp_dir, ignore_errors=True)
if __name__ == '__main__':
main()
+32
View File
@@ -0,0 +1,32 @@
#!/usr/bin/env python3
"""删除 MusicXML 中所有 <print> 元素。"""
import sys
import xml.etree.ElementTree as ET
if len(sys.argv) < 2:
print('Usage: python remove_print.py input.musicxml [output.musicxml]')
sys.exit(1)
input_file = sys.argv[1]
output_file = sys.argv[2] if len(sys.argv) > 2 else input_file.replace('.musicxml', '_no_print.musicxml')
ET.register_namespace('m', 'http://www.musescore.org/ns/mscore')
ET.register_namespace('xlink', 'http://www.w3.org/1999/xlink')
tree = ET.parse(input_file)
root = tree.getroot()
removed_count = 0
for elem in root.iter():
to_remove = [c for c in list(elem) if c.tag == 'print']
for c in to_remove:
elem.remove(c)
removed_count += 1
try:
ET.indent(root)
except AttributeError:
pass
tree.write(output_file, encoding='UTF-8', xml_declaration=True)
print(f'Removed {removed_count} <print> element(s)')
print(f'Output: {output_file}')