Initial commit: skills library
- 70 skills with code and documentation - Add .gitignore (ignore __pycache__, output/, temp/, venv/) - Clean up test intermediates and caches
This commit is contained in:
@@ -0,0 +1,91 @@
|
||||
---
|
||||
name: html-splitter
|
||||
description: This skill splits HTML files containing tables into multiple smaller HTML files based on content logic, then converts each to high-resolution PNG. The skill maintains individual table integrity while organizing content into 4-6 logical sections, generating both white background and transparent background PNGs at 500% resolution.
|
||||
---
|
||||
|
||||
# HTML Splitter Skill
|
||||
|
||||
This skill handles the workflow of splitting HTML files containing tables into multiple smaller HTML files based on content logic, then converting each to high-resolution PNG. The skill keeps individual tables intact while organizing content into 4-6 logical sections.
|
||||
|
||||
## Purpose
|
||||
|
||||
When provided with an HTML file containing multiple tables (especially long table collections like award lists, competition results, or data reports), this skill will:
|
||||
1. Analyze the content structure and identify logical splitting points
|
||||
2. Generate 4-6 separate HTML files based on content themes while keeping tables complete
|
||||
3. Convert each HTML file to high-resolution PNG (500% zoom)
|
||||
4. Generate both white background and transparent background versions for each PNG
|
||||
|
||||
## When to Use
|
||||
|
||||
This skill should be used when:
|
||||
- You have a long HTML file with multiple tables that needs to be split into manageable sections
|
||||
- You need to convert HTML tables to high-resolution PNG images for printing or digital use
|
||||
- You want both white background and transparent background versions of the output
|
||||
- You need to maintain the integrity of individual tables while organizing content logically
|
||||
|
||||
## How to Use
|
||||
|
||||
### Input Requirements
|
||||
- An HTML file containing tables with consistent styling
|
||||
- Tables should have semantic structure (headings, table elements, etc.) to identify logical groupings
|
||||
|
||||
### Processing Steps
|
||||
1. **Analyze Structure**: Identify content sections based on headings, table types, and logical groupings
|
||||
2. **Split Logic**: Divide content into 4-6 sections based on:
|
||||
- Similar types of competitions/results
|
||||
- Related content themes
|
||||
- Balanced content amounts per section
|
||||
- Keeping individual tables completely intact
|
||||
3. **Maintain Styling**: Preserve original CSS styling in each split file
|
||||
4. **Generate PNGs**: Convert each HTML section to 500% resolution PNG
|
||||
5. **Create Variants**: Generate both white background and transparent background versions
|
||||
|
||||
### Output Structure
|
||||
The skill creates organized output in this structure:
|
||||
```
|
||||
split-output/
|
||||
├── section-1.html/png (first content section)
|
||||
├── section-2.html/png (second content section)
|
||||
├── section-3.html/png (third content section)
|
||||
├── section-4.html/png (fourth content section)
|
||||
├── section-5.html/png (fifth content section, if needed)
|
||||
└── section-6.html/png (sixth content section, if needed)
|
||||
```
|
||||
|
||||
Each section includes both:
|
||||
- `{section}_500原图.png` - White background version
|
||||
- `{section}_500透明.png` - Transparent background version
|
||||
|
||||
### Technical Specifications
|
||||
- **Resolution**: 500% zoom for high-quality printing
|
||||
- **Table Integrity**: Individual tables are never split across sections
|
||||
- **Content Logic**: Grouping based on semantic meaning (competition types, years, categories)
|
||||
- **Styling Preservation**: Original CSS and formatting maintained in each split file
|
||||
- **Background Options**: Both white and transparent background PNG variants
|
||||
|
||||
### Best Practices
|
||||
- Ensure original HTML has semantic structure (proper headings, table groupings) for optimal splitting
|
||||
- Review content logic to confirm sections make sense for your use case
|
||||
- Verify that individual tables remain intact in the output
|
||||
- Check that table formatting is preserved in the PNG output
|
||||
|
||||
## References
|
||||
|
||||
This skill does not include any reference files in the `references/` directory.
|
||||
|
||||
## Scripts
|
||||
|
||||
This skill includes the following script:
|
||||
|
||||
- `html_splitter.py`: Main script that handles HTML splitting and PNG generation
|
||||
|
||||
The script can be run with:
|
||||
```
|
||||
python scripts/html_splitter.py <input_html_file> [-o output_dir]
|
||||
```
|
||||
|
||||
This will generate split HTML files and corresponding PNGs (both white background and transparent) in the output directory.
|
||||
|
||||
## Assets
|
||||
|
||||
This skill does not include any asset files in the `assets/` directory.
|
||||
@@ -0,0 +1,34 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
HTML Splitter Skill Entry Point
|
||||
|
||||
This is the main entry point for the HTML Splitter skill.
|
||||
"""
|
||||
import sys
|
||||
from pathlib import Path
|
||||
import subprocess
|
||||
import os
|
||||
|
||||
def main():
|
||||
"""Main entry point for HTML splitter skill."""
|
||||
if len(sys.argv) < 2:
|
||||
print("Usage: html-splitter <input-html-file> [-o output-dir]")
|
||||
print("Example: html-splitter my_poster.html -o ./output")
|
||||
return 1
|
||||
|
||||
# Get the script directory (where this file is located)
|
||||
script_dir = Path(__file__).parent
|
||||
splitter_script = script_dir / "html_splitter.py"
|
||||
|
||||
if not splitter_script.exists():
|
||||
print(f"Error: Splitter script not found at {splitter_script}")
|
||||
return 1
|
||||
|
||||
# Call the actual splitter script
|
||||
cmd = [sys.executable, str(splitter_script)] + sys.argv[1:]
|
||||
result = subprocess.run(cmd)
|
||||
|
||||
return result.returncode
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
@@ -0,0 +1,3 @@
|
||||
# Example Reference
|
||||
|
||||
This is an example reference file. Delete if not needed.
|
||||
@@ -0,0 +1,4 @@
|
||||
# html-splitter - dependencies
|
||||
Pillow>=0.0.1
|
||||
numpy>=0.0.1
|
||||
playwright>=0.0.1
|
||||
@@ -0,0 +1,4 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Example script - delete if not needed."""
|
||||
|
||||
print("Hello from skill!")
|
||||
@@ -0,0 +1,381 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
HTML Splitter and PNG Generator
|
||||
|
||||
This script splits a long HTML file containing multiple tables into smaller HTML files
|
||||
based on content logic, then converts each to high-resolution PNG images.
|
||||
"""
|
||||
|
||||
import os
|
||||
import argparse
|
||||
import shutil
|
||||
from pathlib import Path
|
||||
from PIL import Image, ImageDraw
|
||||
import numpy as np
|
||||
import subprocess
|
||||
import time
|
||||
from urllib.parse import quote
|
||||
|
||||
|
||||
def split_html_by_content(content):
|
||||
"""Split HTML content into logical sections based on headings and table groups."""
|
||||
# Split based on major headings
|
||||
sections = []
|
||||
|
||||
# Define split points based on the typical structure we saw
|
||||
split_patterns = [
|
||||
"<h4>第八届李斯特国际钢琴公开赛</h4>",
|
||||
"<h4>第四届微风·长隆国际钢琴声乐公开赛",
|
||||
"<h4>2025第三届弘杨中国作品·深圳青少年钢琴邀请赛</h4>",
|
||||
"<h4>2025年中国音协考级:</h4>",
|
||||
"<h4>李佳茵荣誉",
|
||||
]
|
||||
|
||||
# Find all split positions
|
||||
split_positions = [0] # Start at beginning
|
||||
|
||||
for pattern in split_patterns:
|
||||
pos = content.find(pattern)
|
||||
if pos != -1:
|
||||
# Try to find a good break point near the pattern (after closing div or table)
|
||||
# Look for the next </div> or </table> tag after the pattern
|
||||
end_pos = content.find("</body>", pos)
|
||||
if end_pos != -1:
|
||||
split_positions.append(end_pos + 7) # Length of '</body>'
|
||||
|
||||
# Add the end of content as final split point
|
||||
if len(split_positions) == 1:
|
||||
# If no patterns found, we'll try to split roughly equally
|
||||
content_len = len(content)
|
||||
if content_len > 10000: # If longer than ~10k chars
|
||||
num_parts = min(
|
||||
6, max(2, content_len // 15000)
|
||||
) # Aim for ~15k chars per part
|
||||
for i in range(1, num_parts):
|
||||
split_pos = (content_len * i) // num_parts
|
||||
# Find a good split point near the desired position
|
||||
for j in range(split_pos, min(len(content), split_pos + 2000)):
|
||||
if content[j : j + 6] in ["</div>", "</tab", "</bod"]:
|
||||
split_positions.append(j + 6)
|
||||
break
|
||||
else:
|
||||
split_positions.append(len(content))
|
||||
|
||||
# Make sure the end is included
|
||||
if split_positions[-1] != len(content):
|
||||
split_positions.append(len(content))
|
||||
|
||||
# Create sections based on split positions
|
||||
for i in range(len(split_positions) - 1):
|
||||
start = split_positions[i]
|
||||
end = split_positions[i + 1]
|
||||
section_content = content[start:end]
|
||||
|
||||
# If this isn't the last section, we need to close properly
|
||||
if i < len(split_positions) - 2:
|
||||
# Find where to truncate the content so it's valid HTML
|
||||
last_table_end = section_content.rfind("</table>")
|
||||
if last_table_end != -1:
|
||||
section_content = (
|
||||
section_content[: last_table_end + 8] + "\n</body>\n</html>"
|
||||
) # </table> + closing tags
|
||||
|
||||
sections.append(section_content)
|
||||
|
||||
return sections
|
||||
|
||||
|
||||
def create_complete_html_section(content_part, section_num):
|
||||
"""Create a complete HTML document from a content part."""
|
||||
# Extract the CSS from the original if it's contained in the part
|
||||
css_start = content_part.find("<style>")
|
||||
css_end = content_part.find("</style>")
|
||||
|
||||
if css_start != -1 and css_end != -1:
|
||||
css = content_part[css_start : css_end + 8] # Include </style>
|
||||
else:
|
||||
# Fallback CSS
|
||||
css = """<style>
|
||||
body {
|
||||
font-family: "Microsoft YaHei", "SimHei", "Arial", sans-serif;
|
||||
margin: 20px;
|
||||
line-height: 1.6;
|
||||
background-color: #fff;
|
||||
}
|
||||
|
||||
.table-container {
|
||||
width: 34em;
|
||||
margin: 0 auto 10px auto;
|
||||
}
|
||||
|
||||
h4 {
|
||||
text-align: center;
|
||||
color: #333;
|
||||
margin: 30px 0 8px 0;
|
||||
font-size: 18px;
|
||||
border-bottom: 2px solid #333;
|
||||
padding-bottom: 8px;
|
||||
width: 100%;
|
||||
}
|
||||
|
||||
.table-3col {
|
||||
width: 34em;
|
||||
border-collapse: collapse;
|
||||
margin: 0 auto;
|
||||
table-layout: fixed;
|
||||
}
|
||||
|
||||
.table-3col td {
|
||||
border-bottom: 1px solid #333;
|
||||
padding: 12px 8px 4px 8px;
|
||||
text-align: center;
|
||||
vertical-align: bottom;
|
||||
font-size: 14px;
|
||||
white-space: nowrap;
|
||||
}
|
||||
|
||||
.table-3col td:nth-child(2) {
|
||||
padding: 10px 4px 4px 4px;
|
||||
}
|
||||
|
||||
.table-3col col:nth-child(1) { width: 20%; }
|
||||
.table-3col col:nth-child(2) { width: 60%; }
|
||||
.table-3col col:nth-child(3) { width: 20%; }
|
||||
|
||||
.table-2col {
|
||||
width: 34em;
|
||||
border-collapse: collapse;
|
||||
margin: 0 auto;
|
||||
table-layout: fixed;
|
||||
}
|
||||
|
||||
.table-2col td {
|
||||
border-bottom: 1px solid #333;
|
||||
padding: 12px 8px 4px 8px;
|
||||
text-align: center;
|
||||
vertical-align: bottom;
|
||||
font-size: 14px;
|
||||
word-wrap: break-word;
|
||||
word-break: break-all;
|
||||
}
|
||||
|
||||
.table-2col col:nth-child(1) { width: 70%; }
|
||||
.table-2col col:nth-child(2) { width: 30%; }
|
||||
|
||||
.teacher-award {
|
||||
font-size: 12px;
|
||||
margin: 2px auto;
|
||||
text-align: center;
|
||||
color: #666;
|
||||
width: 34em;
|
||||
padding: 0 5px;
|
||||
}
|
||||
|
||||
.teacher-award-long {
|
||||
width: 36em;
|
||||
}
|
||||
|
||||
.teacher-award strong {
|
||||
color: #333;
|
||||
}
|
||||
|
||||
.subtitle {
|
||||
text-align: center;
|
||||
margin: 10px 0;
|
||||
font-weight: bold;
|
||||
color: #333;
|
||||
width: 34em;
|
||||
margin-left: auto;
|
||||
margin-right: auto;
|
||||
}
|
||||
|
||||
@media print {
|
||||
body {
|
||||
margin: 0;
|
||||
}
|
||||
table {
|
||||
font-size: 12px;
|
||||
}
|
||||
}
|
||||
</style>"""
|
||||
|
||||
# Extract the body content
|
||||
body_start = content_part.find("<body>")
|
||||
body_end = content_part.find("</body>")
|
||||
|
||||
if body_start != -1 and body_end != -1:
|
||||
body_content = content_part[
|
||||
body_start + 6 : body_end
|
||||
] # +6 for length of '<body>'
|
||||
else:
|
||||
# If no body tags, assume all content is body content
|
||||
body_content = content_part
|
||||
|
||||
# Add proper HTML structure
|
||||
html = f"""<!DOCTYPE html>
|
||||
<html lang="zh-CN">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<title>2025年光荣板汇总 - 第{section_num}页</title>
|
||||
{css}
|
||||
</head>
|
||||
<body>
|
||||
{body_content}
|
||||
</body>
|
||||
</html>"""
|
||||
|
||||
return html
|
||||
|
||||
|
||||
def generate_png_from_html(html_file, output_dir):
|
||||
"""Use Playwright to generate PNG from HTML."""
|
||||
try:
|
||||
from playwright.sync_api import sync_playwright
|
||||
|
||||
with sync_playwright() as p:
|
||||
browser = p.chromium.launch(headless=True)
|
||||
page = browser.new_page()
|
||||
|
||||
# Set large viewport to handle long content
|
||||
page.set_viewport_size({"width": 4000, "height": 6000})
|
||||
|
||||
# Navigate to the HTML file
|
||||
file_url = f"file://{os.path.abspath(html_file)}"
|
||||
page.goto(file_url, wait_until="networkidle")
|
||||
|
||||
# Apply 500% zoom
|
||||
page.evaluate("document.body.style.zoom = '500%'")
|
||||
page.wait_for_timeout(2000) # Wait for zoom to apply
|
||||
|
||||
# Take full page screenshot
|
||||
png_path = str(Path(output_dir) / f"{Path(html_file).stem}_500原图.png")
|
||||
page.screenshot(path=png_path, full_page=True)
|
||||
|
||||
browser.close()
|
||||
|
||||
return png_path
|
||||
except Exception as e:
|
||||
print(f"Error generating PNG with Playwright: {e}")
|
||||
return None
|
||||
|
||||
|
||||
def make_transparent(png_path):
|
||||
"""Create transparent version of PNG."""
|
||||
try:
|
||||
# Disable PIL decompression bomb warning for large images
|
||||
from PIL import Image
|
||||
|
||||
Image.MAX_IMAGE_PIXELS = None
|
||||
|
||||
img = Image.open(png_path)
|
||||
img = img.convert("RGBA")
|
||||
|
||||
# Create transparent version
|
||||
data = np.array(img)
|
||||
alpha = np.full((data.shape[0], data.shape[1]), 255, dtype=np.uint8)
|
||||
white_mask = (
|
||||
(data[:, :, 0] >= 249) & (data[:, :, 1] >= 249) & (data[:, :, 2] >= 249)
|
||||
)
|
||||
alpha[white_mask] = 0
|
||||
data[:, :, 3] = alpha
|
||||
result = Image.fromarray(data, "RGBA")
|
||||
|
||||
# Save transparent version
|
||||
transparent_path = png_path.replace("_500原图.png", "_500透明.png")
|
||||
result.save(transparent_path)
|
||||
|
||||
return transparent_path
|
||||
except Exception as e:
|
||||
print(f"Error creating transparent PNG: {e}")
|
||||
return None
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="Split HTML file and convert to PNGs")
|
||||
parser.add_argument("input_html", help="Input HTML file path")
|
||||
parser.add_argument(
|
||||
"-o", "--output-dir", help="Output directory (default: split-output)"
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
input_path = Path(args.input_html)
|
||||
output_dir = Path(args.output_dir) if args.output_dir else Path("split-output")
|
||||
|
||||
if not input_path.exists():
|
||||
print(f"Error: Input file {input_path} does not exist")
|
||||
return 1
|
||||
|
||||
# Create output directory
|
||||
output_dir.mkdir(exist_ok=True)
|
||||
|
||||
# Read input HTML
|
||||
with open(input_path, "r", encoding="utf-8") as f:
|
||||
content = f.read()
|
||||
|
||||
# Extract content between body tags
|
||||
body_start = content.find("<body>")
|
||||
body_end = content.find("</body>")
|
||||
|
||||
if body_start != -1 and body_end != -1:
|
||||
body_content = content[
|
||||
body_start + 6 : body_end + 7
|
||||
] # Include both opening and closing tags
|
||||
else:
|
||||
body_content = content
|
||||
|
||||
# Prepare complete HTML content with head and body structure
|
||||
complete_content = f"""<!DOCTYPE html>
|
||||
<html lang="zh-CN">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<title>2025年光荣板汇总 - 全部内容</title>
|
||||
{"<style>" + content[content.find("<style>") + 7 : content.find("</style>")] + "</style>" if "<style>" in content else ""}
|
||||
</head>
|
||||
{body_content}
|
||||
</html>"""
|
||||
|
||||
# Split into sections
|
||||
sections = split_html_by_content(complete_content)
|
||||
|
||||
if not sections:
|
||||
print("Could not split content into meaningful sections")
|
||||
return 1
|
||||
|
||||
print(f"Split content into {len(sections)} sections")
|
||||
|
||||
# Process each section
|
||||
for i, section_content in enumerate(sections, 1):
|
||||
print(f"Processing section {i}/{len(sections)}...")
|
||||
|
||||
# Create complete HTML for this section
|
||||
section_html = create_complete_html_section(section_content, i)
|
||||
|
||||
# Write HTML file
|
||||
html_filename = output_dir / f"2025年光荣板汇总_第{i}页.html"
|
||||
with open(html_filename, "w", encoding="utf-8") as f:
|
||||
f.write(section_html)
|
||||
|
||||
# Generate PNG from HTML
|
||||
png_path = generate_png_from_html(str(html_filename), output_dir)
|
||||
|
||||
if png_path:
|
||||
# Generate transparent version
|
||||
transparent_path = make_transparent(png_path)
|
||||
if transparent_path:
|
||||
print(
|
||||
f" ✓ Created {Path(png_path).name} and {Path(transparent_path).name}"
|
||||
)
|
||||
else:
|
||||
print(f" ✓ Created {Path(png_path).name} (transparent version failed)")
|
||||
else:
|
||||
print(f" ✗ Failed to create PNG for section {i}")
|
||||
|
||||
print(f"\nCompleted! Output saved to: {output_dir}")
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
exit(main())
|
||||
Reference in New Issue
Block a user