GLM-4-9B-Chat-1M入门指南：Function Call返回JSON Schema校验与错误重试机制

宁南山

392人浏览 · 2026-02-14 00:50:01

宁南山 · 2026-02-14 00:50:01 发布

GLM-4-9B-Chat-1M入门指南：Function Call返回JSON Schema校验与错误重试机制

1. 开篇：认识这个能读200万字的长文本专家

如果你正在寻找一个既能处理超长文档，又能在单张显卡上运行的AI模型，那么GLM-4-9B-Chat-1M绝对值得关注。这个模型最厉害的地方在于它能一次性处理长达100万个token的文本，相当于200万字的中文内容！

想象一下，你可以直接把一本300页的小说、一份完整的财报或者一整套技术文档扔给这个模型，它都能完整地阅读并理解。更实用的是，它内置了Function Call功能，让你能够像调用API一样让模型执行特定任务。

本文将重点教你如何使用GLM-4-9B-Chat-1M的Function Call功能，特别是如何处理JSON Schema校验和错误重试这两个在实际应用中经常遇到的问题。

2. 环境准备与快速部署

2.1 硬件要求与模型选择

GLM-4-9B-Chat-1M对硬件要求相当友好，提供了多种选择：

FP16版本：需要约18GB显存，适合RTX 4090等高端显卡
INT4量化版：仅需9GB显存，RTX 3090就能流畅运行
CPU推理：通过llama.cpp也可在CPU上运行，速度稍慢但无需显卡

对于大多数开发者，建议从INT4量化版本开始，平衡了性能与资源需求。

2.2 一键部署方案

最简单的启动方式是使用官方提供的Docker镜像：

# 使用vLLM加速推理
docker run -d --gpus all -p 8000:8000 \
  -v /path/to/models:/models \
  glm4-9b-chat-1m-vllm:latest \
  --model /models/glm-4-9b-chat-1m-int4 \
  --enable-chunked-prefill \
  --max-num-batched-tokens 8192

这个命令会启动一个API服务，默认端口8000，支持高效的流式输出和批量处理。

3. Function Call基础：从零开始理解

3.1 什么是Function Call？

简单来说，Function Call让AI模型能够像程序员调用函数一样执行特定任务。你定义好函数的名称、参数和描述，模型就能根据你的需求选择并调用合适的函数。

比如，你可以定义一个"查询天气"的函数，当用户问"今天北京天气怎么样？"时，模型会自动调用这个函数并返回结果。

3.2 基础使用示例

让我们从一个简单的例子开始，定义一个获取股票价格的函数：

import requests
from openai import OpenAI

# 初始化客户端
client = OpenAI(base_url="http://localhost:8000/v1", api_key="none")

# 定义函数工具
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_stock_price",
            "description": "获取指定股票的当前价格",
            "parameters": {
                "type": "object",
                "properties": {
                    "symbol": {
                        "type": "string",
                        "description": "股票代码，如AAPL、MSFT"
                    }
                },
                "required": ["symbol"]
            }
        }
    }
]

# 调用模型
response = client.chat.completions.create(
    model="glm-4-9b-chat-1m",
    messages=[{"role": "user", "content": "苹果公司现在的股价是多少？"}],
    tools=tools,
    tool_choice="auto"
)

模型会识别出需要调用get_stock_price函数，并返回类似这样的响应：

{
  "tool_calls": [
    {
      "id": "call_123",
      "type": "function",
      "function": {
        "name": "get_stock_price",
        "arguments": "{\"symbol\": \"AAPL\"}"
      }
    }
  ]
}

4. JSON Schema校验：确保数据格式正确

4.1 为什么需要Schema校验？

在实际应用中，模型返回的JSON数据可能需要被其他系统使用。如果格式不正确，会导致后续处理失败。JSON Schema就像一份数据合同，明确了数据应该长什么样。

GLM-4-9B-Chat-1M支持严格的Schema校验，确保返回的数据完全符合你的预期格式。

4.2 定义复杂的Schema示例

假设我们要创建一个用户信息提取函数，要求返回特定格式的数据：

user_info_schema = {
    "type": "object",
    "properties": {
        "name": {
            "type": "string",
            "description": "用户全名"
        },
        "age": {
            "type": "integer",
            "description": "用户年龄",
            "minimum": 0,
            "maximum": 150
        },
        "email": {
            "type": "string",
            "format": "email",
            "description": "邮箱地址"
        },
        "interests": {
            "type": "array",
            "items": {
                "type": "string"
            },
            "description": "兴趣爱好列表",
            "minItems": 1,
            "maxItems": 10
        }
    },
    "required": ["name", "email"],
    "additionalProperties": False
}

这个Schema定义了：

name和email是必填字段
age必须是0-150之间的整数
interests是字符串数组，最多10项
不允许出现未定义的额外字段

4.3 处理校验错误

当模型返回的数据不符合Schema时，你需要有相应的错误处理机制：

import jsonschema
from typing import Dict, Any

def validate_with_retry(response_data: Dict[str, Any], schema: Dict) -> Dict[str, Any]:
    """
    验证数据是否符合Schema，如果不符合则尝试修复或抛出错误
    """
    try:
        jsonschema.validate(instance=response_data, schema=schema)
        return response_data
    except jsonschema.ValidationError as e:
        print(f"Schema验证失败: {e.message}")
        print(f"错误路径: {e.json_path}")
        print(f"接收到的数据: {response_data}")
        
        # 这里可以添加自动修复逻辑或请求重试
        raise

5. 错误重试机制：让Function Call更可靠

5.1 常见的Function Call错误类型

在实际使用中，你可能会遇到这些错误：

格式错误：返回的JSON无法解析
Schema不符：数据不符合预定义的Schema
逻辑错误：参数值不合理（如不存在的股票代码）
超时错误：外部API响应超时

5.2 实现智能重试策略

下面是一个完整的重试机制实现：

import json
import time
from typing import List, Dict, Any, Optional
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type

class FunctionCallClient:
    def __init__(self, base_url: str = "http://localhost:8000/v1"):
        self.client = OpenAI(base_url=base_url, api_key="none")
        self.max_retries = 3
        
    @retry(
        stop=stop_after_attempt(3),
        wait=wait_exponential(multiplier=1, min=1, max=10),
        retry=retry_if_exception_type((json.JSONDecodeError, ValueError))
    )
    def call_function_with_retry(
        self,
        messages: List[Dict],
        tools: List[Dict],
        schema: Optional[Dict] = None
    ) -> Dict[str, Any]:
        """
        带重试机制的Function Call调用
        """
        try:
            response = self.client.chat.completions.create(
                model="glm-4-9b-chat-1m",
                messages=messages,
                tools=tools,
                tool_choice="auto"
            )
            
            # 提取函数调用信息
            tool_call = response.choices[0].message.tool_calls[0]
            arguments = json.loads(tool_call.function.arguments)
            
            # 如果提供了Schema，进行验证
            if schema:
                jsonschema.validate(instance=arguments, schema=schema)
                
            return arguments
            
        except json.JSONDecodeError as e:
            print(f"JSON解析失败: {e}")
            # 可以在这里添加修复逻辑或重新构造请求
            raise
        except jsonschema.ValidationError as e:
            print(f"Schema验证失败: {e}")
            # 提供更具体的错误信息给模型
            self._add_validation_feedback(messages, e)
            raise ValueError(f"Schema验证失败: {e.message}")
        except Exception as e:
            print(f"其他错误: {e}")
            raise
    
    def _add_validation_feedback(self, messages: List[Dict], error: jsonschema.ValidationError):
        """向消息中添加验证错误反馈，帮助模型改进"""
        feedback_msg = {
            "role": "system",
            "content": f"上次的函数调用返回了无效参数: {error.message}. 请修正参数后重试。"
        }
        messages.append(feedback_msg)

5.3 进阶重试策略：指数退避与熔断器

对于生产环境，建议实现更健壮的重试机制：

from circuitbreaker import circuit

class RobustFunctionCaller(FunctionCallClient):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.failure_count = 0
        self.last_failure_time = 0
    
    @circuit(failure_threshold=5, recovery_timeout=60)
    def robust_function_call(self, messages, tools, schema=None):
        """
        带熔断机制的Function Call
        """
        current_time = time.time()
        
        # 检查是否应该熔断
        if self.failure_count >= 5 and current_time - self.last_failure_time < 60:
            raise Exception("服务熔断中，请稍后重试")
        
        try:
            result = self.call_function_with_retry(messages, tools, schema)
            self.failure_count = 0  # 重置失败计数
            return result
        except Exception as e:
            self.failure_count += 1
            self.last_failure_time = current_time
            raise

6. 实战案例：构建一个智能信息提取系统

让我们用一个完整的例子来展示如何结合JSON Schema校验和错误重试机制。

6.1 定义复杂的信息提取Schema

extraction_schema = {
    "type": "object",
    "properties": {
        "entities": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "type": {"type": "string", "enum": ["PERSON", "ORG", "LOCATION", "OTHER"]},
                    "confidence": {"type": "number", "minimum": 0, "maximum": 1}
                },
                "required": ["name", "type"]
            }
        },
        "sentiment": {
            "type": "object",
            "properties": {
                "score": {"type": "number", "minimum": -1, "maximum": 1},
                "label": {"type": "string", "enum": ["POSITIVE", "NEUTRAL", "NEGATIVE"]}
            },
            "required": ["score", "label"]
        },
        "key_phrases": {
            "type": "array",
            "items": {"type": "string"},
            "minItems": 1,
            "maxItems": 10
        }
    },
    "required": ["entities", "sentiment", "key_phrases"],
    "additionalProperties": False
}

6.2 实现完整的处理流程

def process_long_document(text_content: str) -> Dict[str, Any]:
    """
    处理长文档并提取结构化信息
    """
    tools = [
        {
            "type": "function",
            "function": {
                "name": "extract_document_info",
                "description": "从文档中提取实体、情感和关键短语",
                "parameters": extraction_schema
            }
        }
    ]
    
    messages = [
        {
            "role": "system",
            "content": "你是一个专业的信息提取助手。请仔细分析以下文档内容，并提取关键信息。"
        },
        {
            "role": "user", 
            "content": f"请分析以下文档：{text_content}"
        }
    ]
    
    caller = RobustFunctionCaller()
    
    try:
        # 第一次尝试
        result = caller.robust_function_call(messages, tools, extraction_schema)
        return result
        
    except Exception as e:
        print(f"第一次尝试失败: {e}")
        
        # 添加更详细的指导
        guidance_msg = {
            "role": "system",
            "content": """请确保返回的数据严格符合以下要求：
            1. entities数组中的每个实体必须包含name和type字段
            2. sentiment必须包含score(-1到1)和label
            3. key_phrases应该是1-10个字符串的数组
            4. 不要包含任何额外字段"""
        }
        messages.append(guidance_msg)
        
        # 第二次尝试
        try:
            result = caller.robust_function_call(messages, tools, extraction_schema)
            return result
        except Exception as e:
            print(f"第二次尝试失败: {e}")
            # 降级方案：返回基础结果
            return {"entities": [], "sentiment": {"score": 0, "label": "NEUTRAL"}, "key_phrases": []}

7. 性能优化与最佳实践

7.1 充分利用1M上下文优势

GLM-4-9B-Chat-1M的最大特色是超长上下文，这意味着你可以：

一次性处理完整文档：无需分段处理，保持上下文完整性
维护对话历史：在多轮对话中保留更长的历史记录
批量处理：同时处理多个相关文档

7.2 优化Function Call性能

# 批量处理多个Function Call请求
def batch_process_requests(requests_list):
    """
    批量处理多个Function Call请求，提高效率
    """
    results = []
    
    for messages, tools in requests_list:
        try:
            result = call_function_with_retry(messages, tools)
            results.append({"status": "success", "data": result})
        except Exception as e:
            results.append({"status": "error", "message": str(e)})
    
    return results

# 使用异步处理提高吞吐量
import asyncio

async async_process_requests(requests_list):
    """
    异步处理多个请求
    """
    tasks = []
    for request in requests_list:
        task = asyncio.create_task(
            async_call_function(request['messages'], request['tools'])
        )
        tasks.append(task)
    
    return await asyncio.gather(*tasks, return_exceptions=True)

7.3 监控与日志记录

建立完善的监控体系：

import logging
from dataclasses import dataclass
from datetime import datetime

@dataclass
class FunctionCallMetrics:
    call_count: int = 0
    success_count: int = 0
    validation_errors: int = 0
    last_call_time: datetime = None

class MonitoredFunctionCaller(FunctionCallClient):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.metrics = FunctionCallMetrics()
        self.logger = logging.getLogger("function_caller")
    
    def call_with_monitoring(self, messages, tools, schema=None):
        self.metrics.call_count += 1
        self.metrics.last_call_time = datetime.now()
        
        try:
            result = self.call_function_with_retry(messages, tools, schema)
            self.metrics.success_count += 1
            self.logger.info("Function call successful")
            return result
        except jsonschema.ValidationError as e:
            self.metrics.validation_errors += 1
            self.logger.warning(f"Schema validation failed: {e}")
            raise
        except Exception as e:
            self.logger.error(f"Function call failed: {e}")
            raise