Initial commit: skills library

- 70 skills with code and documentation
- Add .gitignore (ignore __pycache__, output/, temp/, venv/)
- Clean up test intermediates and caches
This commit is contained in:
hmo
2026-04-26 19:27:40 +08:00
commit 04db423416
861 changed files with 210414 additions and 0 deletions
+91
View File
@@ -0,0 +1,91 @@
---
name: html-splitter
description: This skill splits HTML files containing tables into multiple smaller HTML files based on content logic, then converts each to high-resolution PNG. The skill maintains individual table integrity while organizing content into 4-6 logical sections, generating both white background and transparent background PNGs at 500% resolution.
---
# HTML Splitter Skill
This skill handles the workflow of splitting HTML files containing tables into multiple smaller HTML files based on content logic, then converting each to high-resolution PNG. The skill keeps individual tables intact while organizing content into 4-6 logical sections.
## Purpose
When provided with an HTML file containing multiple tables (especially long table collections like award lists, competition results, or data reports), this skill will:
1. Analyze the content structure and identify logical splitting points
2. Generate 4-6 separate HTML files based on content themes while keeping tables complete
3. Convert each HTML file to high-resolution PNG (500% zoom)
4. Generate both white background and transparent background versions for each PNG
## When to Use
This skill should be used when:
- You have a long HTML file with multiple tables that needs to be split into manageable sections
- You need to convert HTML tables to high-resolution PNG images for printing or digital use
- You want both white background and transparent background versions of the output
- You need to maintain the integrity of individual tables while organizing content logically
## How to Use
### Input Requirements
- An HTML file containing tables with consistent styling
- Tables should have semantic structure (headings, table elements, etc.) to identify logical groupings
### Processing Steps
1. **Analyze Structure**: Identify content sections based on headings, table types, and logical groupings
2. **Split Logic**: Divide content into 4-6 sections based on:
- Similar types of competitions/results
- Related content themes
- Balanced content amounts per section
- Keeping individual tables completely intact
3. **Maintain Styling**: Preserve original CSS styling in each split file
4. **Generate PNGs**: Convert each HTML section to 500% resolution PNG
5. **Create Variants**: Generate both white background and transparent background versions
### Output Structure
The skill creates organized output in this structure:
```
split-output/
├── section-1.html/png (first content section)
├── section-2.html/png (second content section)
├── section-3.html/png (third content section)
├── section-4.html/png (fourth content section)
├── section-5.html/png (fifth content section, if needed)
└── section-6.html/png (sixth content section, if needed)
```
Each section includes both:
- `{section}_500原图.png` - White background version
- `{section}_500透明.png` - Transparent background version
### Technical Specifications
- **Resolution**: 500% zoom for high-quality printing
- **Table Integrity**: Individual tables are never split across sections
- **Content Logic**: Grouping based on semantic meaning (competition types, years, categories)
- **Styling Preservation**: Original CSS and formatting maintained in each split file
- **Background Options**: Both white and transparent background PNG variants
### Best Practices
- Ensure original HTML has semantic structure (proper headings, table groupings) for optimal splitting
- Review content logic to confirm sections make sense for your use case
- Verify that individual tables remain intact in the output
- Check that table formatting is preserved in the PNG output
## References
This skill does not include any reference files in the `references/` directory.
## Scripts
This skill includes the following script:
- `html_splitter.py`: Main script that handles HTML splitting and PNG generation
The script can be run with:
```
python scripts/html_splitter.py <input_html_file> [-o output_dir]
```
This will generate split HTML files and corresponding PNGs (both white background and transparent) in the output directory.
## Assets
This skill does not include any asset files in the `assets/` directory.
View File
+34
View File
@@ -0,0 +1,34 @@
#!/usr/bin/env python3
"""
HTML Splitter Skill Entry Point
This is the main entry point for the HTML Splitter skill.
"""
import sys
from pathlib import Path
import subprocess
import os
def main():
"""Main entry point for HTML splitter skill."""
if len(sys.argv) < 2:
print("Usage: html-splitter <input-html-file> [-o output-dir]")
print("Example: html-splitter my_poster.html -o ./output")
return 1
# Get the script directory (where this file is located)
script_dir = Path(__file__).parent
splitter_script = script_dir / "html_splitter.py"
if not splitter_script.exists():
print(f"Error: Splitter script not found at {splitter_script}")
return 1
# Call the actual splitter script
cmd = [sys.executable, str(splitter_script)] + sys.argv[1:]
result = subprocess.run(cmd)
return result.returncode
if __name__ == "__main__":
sys.exit(main())
+3
View File
@@ -0,0 +1,3 @@
# Example Reference
This is an example reference file. Delete if not needed.
+4
View File
@@ -0,0 +1,4 @@
# html-splitter - dependencies
Pillow>=0.0.1
numpy>=0.0.1
playwright>=0.0.1
+4
View File
@@ -0,0 +1,4 @@
#!/usr/bin/env python3
"""Example script - delete if not needed."""
print("Hello from skill!")
+381
View File
@@ -0,0 +1,381 @@
#!/usr/bin/env python3
"""
HTML Splitter and PNG Generator
This script splits a long HTML file containing multiple tables into smaller HTML files
based on content logic, then converts each to high-resolution PNG images.
"""
import os
import argparse
import shutil
from pathlib import Path
from PIL import Image, ImageDraw
import numpy as np
import subprocess
import time
from urllib.parse import quote
def split_html_by_content(content):
"""Split HTML content into logical sections based on headings and table groups."""
# Split based on major headings
sections = []
# Define split points based on the typical structure we saw
split_patterns = [
"<h4>第八届李斯特国际钢琴公开赛</h4>",
"<h4>第四届微风·长隆国际钢琴声乐公开赛",
"<h4>2025第三届弘杨中国作品·深圳青少年钢琴邀请赛</h4>",
"<h4>2025年中国音协考级:</h4>",
"<h4>李佳茵荣誉",
]
# Find all split positions
split_positions = [0] # Start at beginning
for pattern in split_patterns:
pos = content.find(pattern)
if pos != -1:
# Try to find a good break point near the pattern (after closing div or table)
# Look for the next </div> or </table> tag after the pattern
end_pos = content.find("</body>", pos)
if end_pos != -1:
split_positions.append(end_pos + 7) # Length of '</body>'
# Add the end of content as final split point
if len(split_positions) == 1:
# If no patterns found, we'll try to split roughly equally
content_len = len(content)
if content_len > 10000: # If longer than ~10k chars
num_parts = min(
6, max(2, content_len // 15000)
) # Aim for ~15k chars per part
for i in range(1, num_parts):
split_pos = (content_len * i) // num_parts
# Find a good split point near the desired position
for j in range(split_pos, min(len(content), split_pos + 2000)):
if content[j : j + 6] in ["</div>", "</tab", "</bod"]:
split_positions.append(j + 6)
break
else:
split_positions.append(len(content))
# Make sure the end is included
if split_positions[-1] != len(content):
split_positions.append(len(content))
# Create sections based on split positions
for i in range(len(split_positions) - 1):
start = split_positions[i]
end = split_positions[i + 1]
section_content = content[start:end]
# If this isn't the last section, we need to close properly
if i < len(split_positions) - 2:
# Find where to truncate the content so it's valid HTML
last_table_end = section_content.rfind("</table>")
if last_table_end != -1:
section_content = (
section_content[: last_table_end + 8] + "\n</body>\n</html>"
) # </table> + closing tags
sections.append(section_content)
return sections
def create_complete_html_section(content_part, section_num):
"""Create a complete HTML document from a content part."""
# Extract the CSS from the original if it's contained in the part
css_start = content_part.find("<style>")
css_end = content_part.find("</style>")
if css_start != -1 and css_end != -1:
css = content_part[css_start : css_end + 8] # Include </style>
else:
# Fallback CSS
css = """<style>
body {
font-family: "Microsoft YaHei", "SimHei", "Arial", sans-serif;
margin: 20px;
line-height: 1.6;
background-color: #fff;
}
.table-container {
width: 34em;
margin: 0 auto 10px auto;
}
h4 {
text-align: center;
color: #333;
margin: 30px 0 8px 0;
font-size: 18px;
border-bottom: 2px solid #333;
padding-bottom: 8px;
width: 100%;
}
.table-3col {
width: 34em;
border-collapse: collapse;
margin: 0 auto;
table-layout: fixed;
}
.table-3col td {
border-bottom: 1px solid #333;
padding: 12px 8px 4px 8px;
text-align: center;
vertical-align: bottom;
font-size: 14px;
white-space: nowrap;
}
.table-3col td:nth-child(2) {
padding: 10px 4px 4px 4px;
}
.table-3col col:nth-child(1) { width: 20%; }
.table-3col col:nth-child(2) { width: 60%; }
.table-3col col:nth-child(3) { width: 20%; }
.table-2col {
width: 34em;
border-collapse: collapse;
margin: 0 auto;
table-layout: fixed;
}
.table-2col td {
border-bottom: 1px solid #333;
padding: 12px 8px 4px 8px;
text-align: center;
vertical-align: bottom;
font-size: 14px;
word-wrap: break-word;
word-break: break-all;
}
.table-2col col:nth-child(1) { width: 70%; }
.table-2col col:nth-child(2) { width: 30%; }
.teacher-award {
font-size: 12px;
margin: 2px auto;
text-align: center;
color: #666;
width: 34em;
padding: 0 5px;
}
.teacher-award-long {
width: 36em;
}
.teacher-award strong {
color: #333;
}
.subtitle {
text-align: center;
margin: 10px 0;
font-weight: bold;
color: #333;
width: 34em;
margin-left: auto;
margin-right: auto;
}
@media print {
body {
margin: 0;
}
table {
font-size: 12px;
}
}
</style>"""
# Extract the body content
body_start = content_part.find("<body>")
body_end = content_part.find("</body>")
if body_start != -1 and body_end != -1:
body_content = content_part[
body_start + 6 : body_end
] # +6 for length of '<body>'
else:
# If no body tags, assume all content is body content
body_content = content_part
# Add proper HTML structure
html = f"""<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>2025年光荣板汇总 - 第{section_num}页</title>
{css}
</head>
<body>
{body_content}
</body>
</html>"""
return html
def generate_png_from_html(html_file, output_dir):
"""Use Playwright to generate PNG from HTML."""
try:
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
# Set large viewport to handle long content
page.set_viewport_size({"width": 4000, "height": 6000})
# Navigate to the HTML file
file_url = f"file://{os.path.abspath(html_file)}"
page.goto(file_url, wait_until="networkidle")
# Apply 500% zoom
page.evaluate("document.body.style.zoom = '500%'")
page.wait_for_timeout(2000) # Wait for zoom to apply
# Take full page screenshot
png_path = str(Path(output_dir) / f"{Path(html_file).stem}_500原图.png")
page.screenshot(path=png_path, full_page=True)
browser.close()
return png_path
except Exception as e:
print(f"Error generating PNG with Playwright: {e}")
return None
def make_transparent(png_path):
"""Create transparent version of PNG."""
try:
# Disable PIL decompression bomb warning for large images
from PIL import Image
Image.MAX_IMAGE_PIXELS = None
img = Image.open(png_path)
img = img.convert("RGBA")
# Create transparent version
data = np.array(img)
alpha = np.full((data.shape[0], data.shape[1]), 255, dtype=np.uint8)
white_mask = (
(data[:, :, 0] >= 249) & (data[:, :, 1] >= 249) & (data[:, :, 2] >= 249)
)
alpha[white_mask] = 0
data[:, :, 3] = alpha
result = Image.fromarray(data, "RGBA")
# Save transparent version
transparent_path = png_path.replace("_500原图.png", "_500透明.png")
result.save(transparent_path)
return transparent_path
except Exception as e:
print(f"Error creating transparent PNG: {e}")
return None
def main():
parser = argparse.ArgumentParser(description="Split HTML file and convert to PNGs")
parser.add_argument("input_html", help="Input HTML file path")
parser.add_argument(
"-o", "--output-dir", help="Output directory (default: split-output)"
)
args = parser.parse_args()
input_path = Path(args.input_html)
output_dir = Path(args.output_dir) if args.output_dir else Path("split-output")
if not input_path.exists():
print(f"Error: Input file {input_path} does not exist")
return 1
# Create output directory
output_dir.mkdir(exist_ok=True)
# Read input HTML
with open(input_path, "r", encoding="utf-8") as f:
content = f.read()
# Extract content between body tags
body_start = content.find("<body>")
body_end = content.find("</body>")
if body_start != -1 and body_end != -1:
body_content = content[
body_start + 6 : body_end + 7
] # Include both opening and closing tags
else:
body_content = content
# Prepare complete HTML content with head and body structure
complete_content = f"""<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>2025年光荣板汇总 - 全部内容</title>
{"<style>" + content[content.find("<style>") + 7 : content.find("</style>")] + "</style>" if "<style>" in content else ""}
</head>
{body_content}
</html>"""
# Split into sections
sections = split_html_by_content(complete_content)
if not sections:
print("Could not split content into meaningful sections")
return 1
print(f"Split content into {len(sections)} sections")
# Process each section
for i, section_content in enumerate(sections, 1):
print(f"Processing section {i}/{len(sections)}...")
# Create complete HTML for this section
section_html = create_complete_html_section(section_content, i)
# Write HTML file
html_filename = output_dir / f"2025年光荣板汇总_第{i}页.html"
with open(html_filename, "w", encoding="utf-8") as f:
f.write(section_html)
# Generate PNG from HTML
png_path = generate_png_from_html(str(html_filename), output_dir)
if png_path:
# Generate transparent version
transparent_path = make_transparent(png_path)
if transparent_path:
print(
f" ✓ Created {Path(png_path).name} and {Path(transparent_path).name}"
)
else:
print(f" ✓ Created {Path(png_path).name} (transparent version failed)")
else:
print(f" ✗ Failed to create PNG for section {i}")
print(f"\nCompleted! Output saved to: {output_dir}")
return 0
if __name__ == "__main__":
exit(main())