GLM-OCR Python调用实例：批量处理文件夹内所有图片并导出JSON结构化结果

酸甜草莓二侠

235人浏览 · 2026-02-13 00:12:15

酸甜草莓二侠 · 2026-02-13 00:12:15 发布

GLM-OCR Python调用实例：批量处理文件夹内所有图片并导出JSON结构化结果

1. 项目概述与环境准备

GLM-OCR是一个基于先进多模态架构的OCR识别模型，专门针对复杂文档理解场景设计。与传统的OCR工具不同，它不仅能够识别文字，还能理解表格结构、数学公式等复杂内容，并将识别结果转换为结构化的JSON格式。

1.1 环境要求与安装

在开始批量处理之前，需要确保环境配置正确。GLM-OCR需要Python 3.10环境和相关依赖包：

# 创建conda环境（如果尚未创建）
conda create -n py310 python=3.10.19
conda activate py310

# 安装必要依赖
pip install gradio_client pillow tqdm

1.2 服务启动验证

确保GLM-OCR服务已经正常启动：

# 进入项目目录并启动服务
cd /root/GLM-OCR
./start_vllm.sh

# 验证服务状态
curl http://localhost:7860

服务正常启动后，会在7860端口提供API服务，这是我们后续批量处理的基础。

2. 单张图片处理基础

在开始批量处理前，我们先了解如何单张图片处理，这是批量处理的基础。

2.1 基本调用方法

from gradio_client import Client
import json

def process_single_image(image_path, output_json_path=None):
    """
    处理单张图片并返回结构化结果
    
    Args:
        image_path: 图片文件路径
        output_json_path: 可选，JSON结果保存路径
    
    Returns:
        dict: 结构化识别结果
    """
    # 连接本地服务
    client = Client("http://localhost:7860")
    
    try:
        # 调用OCR识别
        result = client.predict(
            image_path=image_path,
            prompt="Text Recognition:",  # 文本识别模式
            api_name="/predict"
        )
        
        # 解析结果
        if output_json_path:
            with open(output_json_path, 'w', encoding='utf-8') as f:
                json.dump(result, f, ensure_ascii=False, indent=2)
        
        return result
        
    except Exception as e:
        print(f"处理图片 {image_path} 时出错: {str(e)}")
        return None

# 使用示例
result = process_single_image("example.png", "result.json")
print(json.dumps(result, indent=2))

2.2 不同识别模式

GLM-OCR支持多种识别模式，可以根据图片内容选择最合适的模式：

def recognize_with_mode(image_path, mode="text"):
    """
    根据模式选择不同的识别方式
    
    Args:
        image_path: 图片路径
        mode: 识别模式 - "text", "table", "formula"
    """
    client = Client("http://localhost:7860")
    
    mode_prompts = {
        "text": "Text Recognition:",
        "table": "Table Recognition:", 
        "formula": "Formula Recognition:"
    }
    
    prompt = mode_prompts.get(mode, "Text Recognition:")
    
    result = client.predict(
        image_path=image_path,
        prompt=prompt,
        api_name="/predict"
    )
    
    return result

3. 批量处理实现方案

现在我们来实现核心的批量处理功能，能够自动遍历文件夹中的所有图片并进行处理。

3.1 文件夹遍历与图片发现

首先需要编写函数来发现指定文件夹中的所有图片文件：

import os
from pathlib import Path

def find_image_files(folder_path, extensions=None):
    """
    查找文件夹中的所有图片文件
    
    Args:
        folder_path: 文件夹路径
        extensions: 支持的图片扩展名列表
    
    Returns:
        list: 图片文件路径列表
    """
    if extensions is None:
        extensions = ['.png', '.jpg', '.jpeg', '.webp', '.bmp', '.tiff']
    
    folder_path = Path(folder_path)
    image_files = []
    
    for ext in extensions:
        image_files.extend(folder_path.glob(f"*{ext}"))
        image_files.extend(folder_path.glob(f"*{ext.upper()}"))
    
    return [str(file) for file in image_files]

3.2 批量处理核心逻辑

实现完整的批量处理流水线：

import json
from tqdm import tqdm
from datetime import datetime

def batch_process_images(input_folder, output_folder, mode="text"):
    """
    批量处理文件夹中的所有图片
    
    Args:
        input_folder: 输入图片文件夹
        output_folder: 输出JSON文件夹
        mode: 识别模式
    """
    # 创建输出文件夹
    output_path = Path(output_folder)
    output_path.mkdir(exist_ok=True)
    
    # 查找所有图片文件
    image_files = find_image_files(input_folder)
    print(f"找到 {len(image_files)} 个图片文件")
    
    # 初始化客户端
    client = Client("http://localhost:7860")
    
    results = []
    processed_count = 0
    failed_count = 0
    
    # 处理进度显示
    for image_path in tqdm(image_files, desc="处理图片"):
        try:
            # 处理单张图片
            result = client.predict(
                image_path=image_path,
                prompt=f"{mode.capitalize()} Recognition:",
                api_name="/predict"
            )
            
            # 准备输出文件名
            image_name = Path(image_path).stem
            output_file = output_path / f"{image_name}_{mode}.json"
            
            # 保存结果
            with open(output_file, 'w', encoding='utf-8') as f:
                json.dump(result, f, ensure_ascii=False, indent=2)
            
            # 记录处理信息
            results.append({
                "input_file": image_path,
                "output_file": str(output_file),
                "status": "success",
                "timestamp": datetime.now().isoformat()
            })
            
            processed_count += 1
            
        except Exception as e:
            print(f"\n处理失败: {image_path} - {str(e)}")
            
            results.append({
                "input_file": image_path,
                "status": "failed",
                "error": str(e),
                "timestamp": datetime.now().isoformat()
            })
            
            failed_count += 1
    
    # 生成处理报告
    generate_report(results, output_path, processed_count, failed_count)
    
    return results

def generate_report(results, output_path, processed_count, failed_count):
    """生成处理报告"""
    report = {
        "total_files": len(results),
        "processed_count": processed_count,
        "failed_count": failed_count,
        "success_rate": processed_count / len(results) if len(results) > 0 else 0,
        "process_date": datetime.now().isoformat(),
        "details": results
    }
    
    report_file = output_path / "processing_report.json"
    with open(report_file, 'w', encoding='utf-8') as f:
        json.dump(report, f, ensure_ascii=False, indent=2)
    
    print(f"\n处理完成! 成功: {processed_count}, 失败: {failed_count}")
    print(f"详细报告已保存至: {report_file}")

4. 高级功能与优化

4.1 多模式批量处理

支持同时使用多种识别模式处理同一批图片：

def multi_mode_batch_process(input_folder, output_folder, modes=None):
    """
    使用多种模式批量处理图片
    
    Args:
        input_folder: 输入文件夹
        output_folder: 输出文件夹  
        modes: 识别模式列表
    """
    if modes is None:
        modes = ["text", "table", "formula"]
    
    all_results = {}
    
    for mode in modes:
        print(f"\n开始 {mode} 模式处理...")
        mode_output = Path(output_folder) / mode
        results = batch_process_images(input_folder, mode_output, mode)
        all_results[mode] = results
    
    return all_results

4.2 结果合并与统计分析

将多个模式的识别结果合并，并提供统计分析：

def analyze_results(output_folder):
    """
    分析处理结果并提供统计信息
    """
    results_path = Path(output_folder)
    all_results = []
    
    # 收集所有JSON结果
    for json_file in results_path.glob("**/*.json"):
        if json_file.name == "processing_report.json":
            continue
            
        try:
            with open(json_file, 'r', encoding='utf-8') as f:
                result_data = json.load(f)
                all_results.append({
                    "file": str(json_file),
                    "data": result_data,
                    "type": json_file.parent.name
                })
        except:
            continue
    
    # 生成统计信息
    stats = {
        "total_results": len(all_results),
        "by_type": {},
        "text_lengths": [],
        "recognition_quality": []
    }
    
    for result in all_results:
        result_type = result["type"]
        if result_type not in stats["by_type"]:
            stats["by_type"][result_type] = 0
        stats["by_type"][result_type] += 1
        
        # 这里可以添加更多分析逻辑
        # 比如分析文本长度、识别置信度等
    
    return stats

4.3 错误处理与重试机制

增强的错误处理和自动重试功能：

def robust_batch_process(input_folder, output_folder, max_retries=3):
    """
    带重试机制的批量处理
    """
    image_files = find_image_files(input_folder)
    client = Client("http://localhost:7860")
    
    for image_path in tqdm(image_files, desc="稳健处理"):
        for attempt in range(max_retries):
            try:
                result = client.predict(
                    image_path=image_path,
                    prompt="Text Recognition:",
                    api_name="/predict"
                )
                
                # 保存结果...
                break  # 成功则跳出重试循环
                
            except Exception as e:
                if attempt == max_retries - 1:
                    print(f"图片 {image_path} 处理失败 after {max_retries} 次尝试")
                else:
                    print(f"第 {attempt + 1} 次尝试失败，重试...")
                    time.sleep(2)  # 等待后重试

5. 完整使用示例

5.1 基本批量处理示例

# 示例：批量处理文件夹中的所有图片
if __name__ == "__main__":
    input_folder = "/path/to/your/images"
    output_folder = "/path/to/output/results"
    
    # 执行批量处理
    results = batch_process_images(input_folder, output_folder)
    
    print("批量处理完成！")
    print(f"共处理 {len(results)} 个文件")

5.2 高级使用示例

# 高级示例：多模式处理与结果分析
def advanced_processing_pipeline():
    input_folder = "input_images"
    output_base = "processing_results"
    
    # 创建时间戳文件夹
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    output_folder = Path(output_base) / timestamp
    output_folder.mkdir(parents=True, exist_ok=True)
    
    # 多模式处理
    print("开始多模式批量处理...")
    all_results = multi_mode_batch_process(input_folder, output_folder)
    
    # 结果分析
    print("\n分析处理结果...")
    stats = analyze_results(output_folder)
    
    # 保存分析报告
    report_file = output_folder / "analysis_report.json"
    with open(report_file, 'w', encoding='utf-8') as f:
        json.dump(stats, f, ensure_ascii=False, indent=2)
    
    print(f"处理完成！结果保存在: {output_folder}")
    print(f"分析报告: {report_file}")

# 运行高级处理流程
advanced_processing_pipeline()

6. 实际应用建议

6.1 性能优化技巧

在处理大量图片时，可以考虑以下优化措施：

# 批量处理时的性能考虑
def optimized_batch_processing():
    # 1. 合理设置批次大小
    # 2. 使用多线程处理（注意服务承受能力）
    # 3. 监控系统资源使用情况
    # 4. 实现断点续处理功能
    pass

6.2 结果后处理建议

对识别结果进行后处理可以提高可用性：

def postprocess_results(json_data):
    """
    对识别结果进行后处理
    """
    processed_data = {
        "original": json_data,
        "extracted_text": extract_text_content(json_data),
        "confidence_scores": calculate_confidence(json_data),
        "structured_data": structure_content(json_data)
    }
    
    return processed_data

7. 总结

通过本文介绍的GLM-OCR批量处理方法，你可以轻松实现以下功能：

自动发现文件夹中的所有图片文件
批量调用GLM-OCR服务进行文字、表格、公式识别
结构化保存识别结果为JSON格式
生成详细的处理报告和统计分析
支持多种识别模式和高级处理选项

这种方法特别适合需要处理大量文档图片的场景，如档案数字化、批量发票处理、文档分析等任务。通过自动化流程，可以大幅提高工作效率和数据处理一致性。

在实际使用中，建议先用小批量图片测试处理效果，确认满意后再进行大规模处理。同时注意监控系统资源使用情况，确保服务的稳定性。

获取更多AI镜像

想探索更多AI镜像和应用场景？访问 CSDN星图镜像广场，提供丰富的预置镜像，覆盖大模型推理、图像生成、视频生成、模型微调等多个领域，支持一键部署。

AI Agent技术社区

Agent 垂直技术社区，欢迎活跃、内容共建。

更多推荐

【无标题】

学而习是一个基于 DeepSeek 大模型的智能试题平台，涵盖试题生成、智能判题、逐题讲解三大 AI 能力，支持多学科（语文、数学、英语、物理、化学）题型体系。用户可在线答题，系统自动评分并展示详细解析。

AI Agent技术社区

联想搞砸了：豪掷重金押注世界杯，AI亮相反成破相

36氪产业分析指出，联想天禧AI所谓全栈智能能力，核心逻辑推理依托DeepSeek-R1开源模型，语音交互、图文识别、多模态分析等全部关键能力均外购第三方接口，企业内部仅负责页面封装、功能串联与界面美化，全程不参与底层算法迭代与模型训练，属于典型的组装式创新，依靠简单技术拼接叠加营销话术，包装出自研全栈AI的假象。纵观整个联想的发展史，不难发现，联想长期坚守“贸工技”发展路线，优先看重市场规模与渠