上传OCR切换:小果GLM-OCR-8bit优先,Tesseract降级
MoFin/server.py 的 _ocr_image 改为: 1. 优先调小果 gateway (192.168.1.122:18003) 的 GLM-OCR-8bit 2. 失败/无响应则自动降级到本地 Tesseract(预处理+chi_sim+eng) 3. fallback逻辑保留原预处理管道(放大/锐化/二值化) ocr_client.py 模块独立可调用,兼作CLI工具
This commit is contained in:
@@ -717,10 +717,22 @@ def upload_page():
|
||||
|
||||
|
||||
def _ocr_image(image_path):
|
||||
"""用Tesseract OCR提取图片中的文字(预处理优化中文表格识别)"""
|
||||
"""优先用小果GLM-OCR-8bit识别,失败则降级到pytesseract"""
|
||||
import sys
|
||||
from PIL import Image, ImageEnhance, ImageFilter
|
||||
import pytesseract
|
||||
|
||||
# 尝试小果OCR(GLM-OCR-8bit)
|
||||
try:
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "scripts"))
|
||||
from ocr_client import ocr_image as xg_ocr
|
||||
result = xg_ocr(image_path, "请识别这张图片中所有文字,包括股票名称、代码、价格、持股数、金额、百分比等。输出完整内容。")
|
||||
if result.get("success") and len(result.get("text", "")) > 20:
|
||||
return result["text"].strip()
|
||||
except Exception:
|
||||
pass # 降级到tesseract
|
||||
|
||||
# 降级:Tesseract(预处理优化中文表格识别)
|
||||
img = Image.open(image_path)
|
||||
|
||||
# 预处理:放大 + 锐化 + 二值化,提升小字识别率
|
||||
|
||||
Reference in New Issue
Block a user