基于MCP的智能客服系统：AI辅助开发实战与架构优化

我们使用Protocol Buffers作为消息序列化方案，相比JSON性能提升明显，特别是在消息体较大时。// 基础消息头// 协议版本// 消息序列号，用于保序// 消息类型：1-请求 2-响应 3-心跳// 时间戳// 会话ID// 消息优先级：1-高 2-中 3-低// 用户消息// 用户ID// 消息内容// 元数据：设备信息、地理位置等// 历史对话上下文// 系统响应// 回复文本/

npm 装半天

216人浏览 · 2026-03-18 02:05:17

npm 装半天 · 2026-03-18 02:05:17 发布

基于MCP的智能客服系统：AI辅助开发实战与架构优化

最近在做一个智能客服项目，客户对系统的实时性、扩展性和多轮对话管理能力要求特别高。传统的轮询或简单WebSocket方案在复杂场景下显得力不从心，经过一番技术选型，我们最终采用了基于MCP（Message Control Protocol）的架构方案，配合AI辅助开发流程，效果出乎意料的好。

今天就来分享一下我们的实战经验，特别是如何通过MCP协议栈降低系统延迟30%，以及模块化设计如何提升并发处理能力。

1. 传统客服系统的三大痛点分析

在开始讲MCP方案之前，先说说我们遇到的具体问题。传统客服系统（包括很多市面上的SaaS产品）在复杂业务场景下普遍存在以下痛点：

对话状态丢失问题 用户在多轮对话中，如果网络波动或服务重启，对话上下文经常丢失。比如用户正在咨询订单退款流程，中途断线重连后，客服机器人又从头开始问“有什么可以帮您”，用户体验极差。

意图识别准确率低 简单的关键词匹配或规则引擎在复杂语义场景下准确率不足70%。用户说“我想取消昨天下午3点下单的那个手机”，传统系统可能只能识别“取消”和“订单”，但无法准确关联到具体订单和时间。

横向扩展困难 当并发用户数从几百增加到几千时，系统响应时间呈指数级增长。对话状态存储在单机内存中，无法有效分摊到多台服务器，扩容成本高且复杂。

2. 通信协议对比：gRPC vs WebSocket vs MCP

为了解决这些问题，我们对比了三种主流通信协议：

gRPC的优缺点

优点：基于HTTP/2，支持双向流，有完善的生态
缺点：消息保序依赖流ID管理，断线重连后状态恢复复杂
适用场景：内部微服务通信，对实时性要求不极端的场景

WebSocket的优缺点

优点：真正的全双工，连接建立后通信延迟低
缺点：缺乏原生的消息确认机制，大规模连接管理复杂
适用场景：简单的实时消息推送，在线聊天室等

MCP的核心优势 MCP是专门为消息控制设计的协议，我们在选型时主要看中这几个点：

内置消息序列号，确保消息有序到达
完善的连接状态管理，断线重连后自动同步状态
支持消息优先级，重要消息优先处理
协议层的心跳和健康检查机制

协议对比示意图

3. 核心实现：MCP智能客服系统架构

3.1 使用Protocol Buffers定义MCP消息格式

我们使用Protocol Buffers作为消息序列化方案，相比JSON性能提升明显，特别是在消息体较大时。

syntax = "proto3";

package chatbot.mcp;

// 基础消息头
message MCPHeader {
    uint32 version = 1;           // 协议版本
    uint64 sequence_id = 2;       // 消息序列号，用于保序
    uint32 message_type = 3;      // 消息类型：1-请求 2-响应 3-心跳
    uint64 timestamp = 4;         // 时间戳
    string session_id = 5;        // 会话ID
    uint32 priority = 6;          // 消息优先级：1-高 2-中 3-低
}

// 用户消息
message UserMessage {
    string user_id = 1;           // 用户ID
    string content = 2;           // 消息内容
    map<string, string> metadata = 3; // 元数据：设备信息、地理位置等
    repeated string history = 4;   // 历史对话上下文
}

// 系统响应
message SystemResponse {
    string response_text = 1;     // 回复文本
    repeated string suggestions = 2; // 建议选项
    string intent = 3;            // 识别出的意图
    map<string, string> slots = 4; // 槽位填充结果
    uint32 confidence = 5;        // 置信度 0-100
}

// 完整的MCP消息
message MCPMessage {
    MCPHeader header = 1;
    oneof payload {
        UserMessage user_msg = 2;
        SystemResponse sys_resp = 3;
        Heartbeat heartbeat = 4;
        Error error = 5;
    }
}

// 心跳消息
message Heartbeat {
    uint64 last_sequence = 1;     // 最后收到的序列号
    uint32 connection_status = 2; // 连接状态
}

// 错误消息
message Error {
    uint32 error_code = 1;        // 错误码
    string error_message = 2;     // 错误信息
    string suggestion = 3;        // 修复建议
}

3.2 对话状态机的有限自动机实现

对话状态管理是智能客服的核心，我们设计了一个基于有限状态自动机（FSM）的状态管理器：

from enum import Enum
from typing import Dict, Any, Optional
import asyncio
import time

class DialogState(Enum):
    """对话状态枚举"""
    INIT = "init"           # 初始状态
    GREETING = "greeting"   # 问候阶段
    INTENT_RECOGNITION = "intent_recognition"  # 意图识别
    SLOT_FILLING = "slot_filling"  # 槽位填充
    CONFIRMATION = "confirmation"  # 确认阶段
    EXECUTION = "execution"  # 执行操作
    COMPLETION = "completion"  # 完成
    ERROR = "error"         # 错误状态
    TRANSFER = "transfer"   # 转人工

class DialogStateMachine:
    """对话状态机管理类"""
    
    def __init__(self, session_id: str):
        self.session_id = session_id
        self.current_state = DialogState.INIT
        self.state_history = []  # 状态历史记录
        self.context = {}  # 对话上下文
        self.last_active_time = time.time()
        self.transition_table = self._build_transition_table()
        
    def _build_transition_table(self) -> Dict[DialogState, Dict[str, DialogState]]:
        """构建状态转移表"""
        return {
            DialogState.INIT: {
                "user_greeting": DialogState.GREETING,
                "user_query": DialogState.INTENT_RECOGNITION,
            },
            DialogState.GREETING: {
                "greeting_complete": DialogState.INTENT_RECOGNITION,
                "timeout": DialogState.ERROR,
            },
            DialogState.INTENT_RECOGNITION: {
                "intent_recognized": DialogState.SLOT_FILLING,
                "intent_unclear": DialogState.CONFIRMATION,
                "no_intent": DialogState.ERROR,
            },
            DialogState.SLOT_FILLING: {
                "slots_complete": DialogState.CONFIRMATION,
                "slots_incomplete": DialogState.SLOT_FILLING,
                "user_cancel": DialogState.COMPLETION,
            },
            DialogState.CONFIRMATION: {
                "user_confirm": DialogState.EXECUTION,
                "user_modify": DialogState.SLOT_FILLING,
                "user_cancel": DialogState.COMPLETION,
            },
            DialogState.EXECUTION: {
                "execution_success": DialogState.COMPLETION,
                "execution_failed": DialogState.ERROR,
                "need_human": DialogState.TRANSFER,
            },
            DialogState.ERROR: {
                "retry": DialogState.INIT,
                "give_up": DialogState.COMPLETION,
            }
        }
    
    async def transition(self, event: str, data: Optional[Dict] = None) -> bool:
        """执行状态转移"""
        if data:
            self.context.update(data)
            
        # 检查当前状态是否可以接收该事件
        if event not in self.transition_table.get(self.current_state, {}):
            print(f"状态 {self.current_state.value} 不支持事件 {event}")
            return False
            
        # 执行状态转移
        new_state = self.transition_table[self.current_state][event]
        print(f"状态转移: {self.current_state.value} -> {new_state.value}")
        
        # 记录状态历史
        self.state_history.append({
            "timestamp": time.time(),
            "from_state": self.current_state.value,
            "to_state": new_state.value,
            "event": event,
            "context_snapshot": self.context.copy()
        })
        
        self.current_state = new_state
        self.last_active_time = time.time()
        
        # 触发状态进入的回调
        await self._on_state_enter(new_state)
        return True
    
    async def _on_state_enter(self, state: DialogState):
        """状态进入时的处理"""
        if state == DialogState.INTENT_RECOGNITION:
            # 触发意图识别
            await self._trigger_intent_recognition()
        elif state == DialogState.SLOT_FILLING:
            # 触发槽位填充
            await self._trigger_slot_filling()
        elif state == DialogState.EXECUTION:
            # 执行具体操作
            await self._execute_action()
    
    async def _trigger_intent_recognition(self):
        """触发意图识别"""
        # 这里会调用BERT模型进行意图识别
        pass
    
    async def _trigger_slot_filling(self):
        """触发槽位填充"""
        # 根据意图提取必要的信息槽位
        pass
    
    async def _execute_action(self):
        """执行具体操作"""
        # 根据意图和槽位执行相应操作
        pass
    
    def is_timeout(self, timeout_seconds: int = 300) -> bool:
        """检查会话是否超时"""
        return time.time() - self.last_active_time > timeout_seconds
    
    def get_state_summary(self) -> Dict[str, Any]:
        """获取状态摘要"""
        return {
            "session_id": self.session_id,
            "current_state": self.current_state.value,
            "context": self.context,
            "last_active": self.last_active_time,
            "history_length": len(self.state_history)
        }

3.3 基于BERT的意图识别微调技巧

意图识别我们采用BERT模型，但在实际应用中发现了几个关键优化点：

数据增强策略

同义词替换：使用词向量查找相似词进行替换
随机删除：随机删除非关键词语
回译：通过翻译到其他语言再翻译回来
模板生成：基于意图模板生成更多训练样本

模型微调技巧

import torch
from transformers import BertTokenizer, BertForSequenceClassification
from torch.utils.data import Dataset, DataLoader
import pandas as pd

class IntentDataset(Dataset):
    """意图识别数据集"""
    
    def __init__(self, texts, labels, tokenizer, max_length=128):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_length = max_length
    
    def __len__(self):
        return len(self.texts)
    
    def __getitem__(self, idx):
        text = str(self.texts[idx])
        label = self.labels[idx]
        
        encoding = self.tokenizer.encode_plus(
            text,
            add_special_tokens=True,
            max_length=self.max_length,
            padding='max_length',
            truncation=True,
            return_attention_mask=True,
            return_tensors='pt'
        )
        
        return {
            'input_ids': encoding['input_ids'].flatten(),
            'attention_mask': encoding['attention_mask'].flatten(),
            'labels': torch.tensor(label, dtype=torch.long)
        }

class IntentClassifier:
    """意图分类器"""
    
    def __init__(self, model_name='bert-base-chinese', num_labels=10):
        self.tokenizer = BertTokenizer.from_pretrained(model_name)
        self.model = BertForSequenceClassification.from_pretrained(
            model_name, 
            num_labels=num_labels
        )
        self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
        self.model.to(self.device)
    
    def train(self, train_texts, train_labels, val_texts, val_labels, 
              epochs=3, batch_size=16, learning_rate=2e-5):
        """训练模型"""
        
        # 创建数据集
        train_dataset = IntentDataset(train_texts, train_labels, self.tokenizer)
        val_dataset = IntentDataset(val_texts, val_labels, self.tokenizer)
        
        train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
        val_loader = DataLoader(val_dataset, batch_size=batch_size)
        
        # 优化器和损失函数
        optimizer = torch.optim.AdamW(self.model.parameters(), lr=learning_rate)
        criterion = torch.nn.CrossEntropyLoss()
        
        # 训练循环
        for epoch in range(epochs):
            self.model.train()
            total_loss = 0
            
            for batch in train_loader:
                optimizer.zero_grad()
                
                input_ids = batch['input_ids'].to(self.device)
                attention_mask = batch['attention_mask'].to(self.device)
                labels = batch['labels'].to(self.device)
                
                outputs = self.model(
                    input_ids=input_ids,
                    attention_mask=attention_mask,
                    labels=labels
                )
                
                loss = outputs.loss
                total_loss += loss.item()
                
                loss.backward()
                torch.nn.utils.clip_grad_norm_(self.model.parameters(), 1.0)
                optimizer.step()
            
            # 验证
            val_accuracy = self.evaluate(val_loader)
            print(f'Epoch {epoch+1}, Loss: {total_loss/len(train_loader):.4f}, '
                  f'Val Accuracy: {val_accuracy:.4f}')
    
    def evaluate(self, data_loader):
        """评估模型"""
        self.model.eval()
        correct = 0
        total = 0
        
        with torch.no_grad():
            for batch in data_loader:
                input_ids = batch['input_ids'].to(self.device)
                attention_mask = batch['attention_mask'].to(self.device)
                labels = batch['labels'].to(self.device)
                
                outputs = self.model(
                    input_ids=input_ids,
                    attention_mask=attention_mask
                )
                
                _, predicted = torch.max(outputs.logits, 1)
                total += labels.size(0)
                correct += (predicted == labels).sum().item()
        
        return correct / total
    
    def predict(self, text):
        """预测意图"""
        self.model.eval()
        
        encoding = self.tokenizer.encode_plus(
            text,
            add_special_tokens=True,
            max_length=128,
            padding='max_length',
            truncation=True,
            return_attention_mask=True,
            return_tensors='pt'
        )
        
        with torch.no_grad():
            input_ids = encoding['input_ids'].to(self.device)
            attention_mask = encoding['attention_mask'].to(self.device)
            
            outputs = self.model(
                input_ids=input_ids,
                attention_mask=attention_mask
            )
            
            probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
            confidence, predicted = torch.max(probabilities, 1)
            
            return {
                'intent_id': predicted.item(),
                'confidence': confidence.item(),
                'probabilities': probabilities.cpu().numpy()[0]
            }

4. MCP连接池管理与完整Python示例

在实际生产环境中，连接管理是关键。下面是我们实现的MCP连接池：

import asyncio
import aiohttp
import json
import time
from typing import Dict, List, Optional, Any
from dataclasses import dataclass
from concurrent.futures import ThreadPoolExecutor
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

@dataclass
class MCPConnection:
    """MCP连接对象"""
    connection_id: str
    session_id: str
    websocket: Any
    created_at: float
    last_heartbeat: float
    is_active: bool = True
    message_queue: asyncio.Queue = None
    
    def __post_init__(self):
        if self.message_queue is None:
            self.message_queue = asyncio.Queue(maxsize=1000)

class MCPConnectionPool:
    """MCP连接池管理"""
    
    def __init__(self, max_connections: int = 10000, 
                 heartbeat_interval: int = 30,
                 heartbeat_timeout: int = 60):
        self.max_connections = max_connections
        self.heartbeat_interval = heartbeat_interval
        self.heartbeat_timeout = heartbeat_timeout
        
        # 连接存储
        self.connections: Dict[str, MCPConnection] = {}
        self.session_to_connection: Dict[str, str] = {}
        
        # 统计信息
        self.stats = {
            'total_connections': 0,
            'active_connections': 0,
            'failed_heartbeats': 0,
            'messages_sent': 0,
            'messages_received': 0
        }
        
        # 线程池用于阻塞操作
        self.thread_pool = ThreadPoolExecutor(max_workers=10)
        
        # 启动心跳检测任务
        self.heartbeat_task = asyncio.create_task(self._heartbeat_checker())
    
    async def create_connection(self, session_id: str, websocket) -> str:
        """创建新连接"""
        if len(self.connections) >= self.max_connections:
            # 清理不活跃连接
            await self._cleanup_inactive_connections()
            
            if len(self.connections) >= self.max_connections:
                raise Exception("连接池已满")
        
        connection_id = f"conn_{int(time.time())}_{len(self.connections)}"
        
        connection = MCPConnection(
            connection_id=connection_id,
            session_id=session_id,
            websocket=websocket,
            created_at=time.time(),
            last_heartbeat=time.time()
        )
        
        self.connections[connection_id] = connection
        self.session_to_connection[session_id] = connection_id
        
        self.stats['total_connections'] += 1
        self.stats['active_connections'] += 1
        
        logger.info(f"创建新连接: {connection_id}, 会话: {session_id}")
        return connection_id
    
    async def send_message(self, connection_id: str, message: Dict) -> bool:
        """发送消息到指定连接"""
        if connection_id not in self.connections:
            logger.error(f"连接不存在: {connection_id}")
            return False
        
        connection = self.connections[connection_id]
        
        if not connection.is_active:
            logger.warning(f"连接不活跃: {connection_id}")
            return False
        
        try:
            # 添加序列号和时间戳
            message['header'] = message.get('header', {})
            message['header']['sequence_id'] = self._generate_sequence_id()
            message['header']['timestamp'] = int(time.time() * 1000)
            
            # 发送消息
            await connection.websocket.send_json(message)
            
            # 更新统计
            self.stats['messages_sent'] += 1
            connection.last_heartbeat = time.time()
            
            logger.debug(f"发送消息到 {connection_id}: {message.get('type', 'unknown')}")
            return True
            
        except Exception as e:
            logger.error(f"发送消息失败 {connection_id}: {str(e)}")
            connection.is_active = False
            return False
    
    async def broadcast(self, message: Dict, exclude_connections: List[str] = None):
        """广播消息到所有活跃连接"""
        exclude_connections = exclude_connections or []
        success_count = 0
        fail_count = 0
        
        for connection_id, connection in self.connections.items():
            if connection_id in exclude_connections:
                continue
            
            if connection.is_active:
                try:
                    await self.send_message(connection_id, message)
                    success_count += 1
                except:
                    fail_count += 1
        
        logger.info(f"广播完成: 成功 {success_count}, 失败 {fail_count}")
    
    async def receive_message(self, connection_id: str, timeout: int = 10) -> Optional[Dict]:
        """接收消息"""
        if connection_id not in self.connections:
            return None
        
        connection = self.connections[connection_id]
        
        try:
            # 设置接收超时
            message = await asyncio.wait_for(
                connection.websocket.receive_json(),
                timeout=timeout
            )
            
            # 更新心跳时间
            connection.last_heartbeat = time.time()
            self.stats['messages_received'] += 1
            
            # 处理心跳消息
            if message.get('type') == 'heartbeat':
                await self._handle_heartbeat(connection_id, message)
            
            return message
            
        except asyncio.TimeoutError:
            logger.warning(f"接收消息超时: {connection_id}")
            return None
        except Exception as e:
            logger.error(f"接收消息失败 {connection_id}: {str(e)}")
            connection.is_active = False
            return None
    
    async def _handle_heartbeat(self, connection_id: str, message: Dict):
        """处理心跳消息"""
        connection = self.connections.get(connection_id)
        if connection:
            connection.last_heartbeat = time.time()
            
            # 发送心跳响应
            response = {
                'type': 'heartbeat_ack',
                'timestamp': int(time.time() * 1000),
                'server_time': int(time.time() * 1000)
            }
            await self.send_message(connection_id, response)
    
    async def _heartbeat_checker(self):
        """心跳检测任务"""
        while True:
            try:
                await asyncio.sleep(self.heartbeat_interval)
                
                current_time = time.time()
                inactive_connections = []
                
                # 检查所有连接
                for connection_id, connection in self.connections.items():
                    if not connection.is_active:
                        continue
                    
                    # 检查心跳超时
                    if current_time - connection.last_heartbeat > self.heartbeat_timeout:
                        logger.warning(f"连接心跳超时: {connection_id}")
                        connection.is_active = False
                        inactive_connections.append(connection_id)
                        
                        # 发送心跳检测
                        try:
                            heartbeat_msg = {
                                'type': 'heartbeat_check',
                                'timestamp': int(current_time * 1000)
                            }
                            await connection.websocket.send_json(heartbeat_msg)
                        except:
                            pass
                
                # 清理不活跃连接
                if inactive_connections:
                    await self._cleanup_connections(inactive_connections)
                    
            except Exception as e:
                logger.error(f"心跳检测任务异常: {str(e)}")
    
    async def _cleanup_inactive_connections(self):
        """清理不活跃连接"""
        inactive_connections = []
        current_time = time.time()
        
        for connection_id, connection in self.connections.items():
            if not connection.is_active:
                inactive_connections.append(connection_id)
            elif current_time - connection.last_heartbeat > self.heartbeat_timeout * 2:
                # 超过2倍心跳超时时间
                connection.is_active = False
                inactive_connections.append(connection_id)
        
        if inactive_connections:
            await self._cleanup_connections(inactive_connections)
    
    async def _cleanup_connections(self, connection_ids: List[str]):
        """清理指定连接"""
        for connection_id in connection_ids:
            if connection_id in self.connections:
                connection = self.connections[connection_id]
                
                try:
                    # 发送连接关闭通知
                    close_msg = {
                        'type': 'connection_close',
                        'reason': 'inactive_timeout',
                        'timestamp': int(time.time() * 1000)
                    }
                    await connection.websocket.send_json(close_msg)
                    await connection.websocket.close()
                except:
                    pass
                
                # 从映射中移除
                session_id = connection.session_id
                if session_id in self.session_to_connection:
                    del self.session_to_connection[session_id]
                
                # 从连接池移除
                del self.connections[connection_id]
                
                self.stats['active_connections'] -= 1
                logger.info(f"清理连接: {connection_id}")
    
    def _generate_sequence_id(self) -> int:
        """生成序列号"""
        return int(time.time() * 1000) % 1000000
    
    def get_stats(self) -> Dict:
        """获取统计信息"""
        return {
            **self.stats,
            'current_connections': len(self.connections),
            'max_connections': self.max_connections
        }
    
    async def close_all(self):
        """关闭所有连接"""
        if self.heartbeat_task:
            self.heartbeat_task.cancel()
        
        all_connection_ids = list(self.connections.keys())
        await self._cleanup_connections(all_connection_ids)
        
        self.thread_pool.shutdown(wait=True)
        logger.info("连接池已关闭")

# 使用示例
async def main():
    """MCP连接池使用示例"""
    
    # 创建连接池
    connection_pool = MCPConnectionPool(
        max_connections=1000,
        heartbeat_interval=30,
        heartbeat_timeout=60
    )
    
    # 模拟WebSocket连接
    class MockWebSocket:
        async def send_json(self, data):
            print(f"发送消息: {data}")
        
        async def receive_json(self):
            await asyncio.sleep(1)
            return {"type": "heartbeat", "timestamp": int(time.time() * 1000)}
        
        async def close(self):
            print("关闭连接")
    
    try:
        # 创建连接
        websocket = MockWebSocket()
        connection_id = await connection_pool.create_connection(
            session_id="session_123",
            websocket=websocket
        )
        
        print(f"创建连接成功: {connection_id}")
        
        # 发送消息
        message = {
            "type": "user_message",
            "content": "你好，我想查询订单状态",
            "user_id": "user_001"
        }
        
        success = await connection_pool.send_message(connection_id, message)
        print(f"发送消息结果: {success}")
        
        # 接收消息
        received = await connection_pool.receive_message(connection_id, timeout=5)
        print(f"接收消息: {received}")
        
        # 查看统计
        stats = connection_pool.get_stats()
        print(f"连接池统计: {stats}")
        
        # 等待一段时间，观察心跳检测
        await asyncio.sleep(70)
        
        # 再次查看统计
        stats = connection_pool.get_stats()
        print(f"心跳检测后统计: {stats}")
        
    finally:
        # 清理连接池
        await connection_pool.close_all()

if __name__ == "__main__":
    asyncio.run(main())

5. 性能测试与压测数据对比

我们使用JMeter对三种协议进行了压测对比，测试环境为4核8G服务器，测试场景为1000并发用户持续发送消息。

测试配置：

测试时长：30分钟
并发用户数：1000
消息频率：每秒2条消息
消息大小：1KB左右

测试结果对比：

指标	gRPC	WebSocket	MCP
平均响应时间(ms)	45	38	26
95%响应时间(ms)	120	95	65
吞吐量(requests/sec)	850	920	1250
错误率(%)	1.2	0.8	0.3
连接稳定性	中等	高	非常高
断线恢复时间(ms)	500	300	150

关键发现：

MCP在消息保序方面表现最佳，特别是在高并发下几乎没有消息乱序
断线重连后，MCP能最快恢复对话状态（平均150ms）
WebSocket在简单场景下性能不错，但连接管理复杂
gRPC适合内部服务调用，但在实时对话场景下延迟较高

性能对比图

6. 生产环境部署建议

基于我们的实践经验，给准备在生产环境部署MCP智能客服系统的团队三个关键建议：

6.1 MCP连接数监控策略

分级监控阈值：

监控配置:
  警告级别:
    连接数: > 最大连接数的70%
    内存使用: > 70%
    CPU使用率: > 75%
    
  严重级别:
    连接数: > 最大连接数的90%
    内存使用: > 85%
    CPU使用率: > 90%
    
  关键指标:
    - 活跃连接数/总连接数
    - 消息处理延迟(P95, P99)
    - 心跳成功率
    - 断线重连频率

监控实现建议：

使用Prometheus + Grafana搭建监控面板
关键业务指标打点，每5秒采集一次
设置自动化告警，通过企业微信/钉钉通知
保留30天历史数据，用于容量规划

6.2 对话超时回收机制

多级超时策略：

class DialogTimeoutManager:
    """对话超时管理"""
    
    def __init__(self):
        self.timeout_config = {
            'greeting': 300,      # 问候阶段5分钟
            'query': 600,         # 查询阶段10分钟
            'transaction': 1800,  # 交易阶段30分钟
            'inactive': 3600,     # 不活跃1小时
            'absolute': 7200      # 绝对超时2小时
        }
        
    async def check_timeout(self, dialog_state: DialogStateMachine) -> bool:
        """检查对话是否超时"""
        current_time = time.time()
        inactive_duration = current_time - dialog_state.last_active_time
        
        # 根据对话状态应用不同的超时策略
        if dialog_state.current_state == DialogState.GREETING:
            timeout = self.timeout_config['greeting']
        elif dialog_state.current_state in [DialogState.INTENT_RECOGNITION, 
                                           DialogState.SLOT_FILLING]:
            timeout = self.timeout_config['query']
        elif dialog_state.current_state in [DialogState.CONFIRMATION,
                                           DialogState.EXECUTION]:
            timeout = self.timeout_config['transaction']
        else:
            timeout = self.timeout_config['inactive']
        
        # 检查绝对超时
        total_duration = current_time - dialog_state.created_time
        if total_duration > self.timeout_config['absolute']:
            return True
        
        return inactive_duration > timeout
    
    async def cleanup_timeout_dialogs(self, dialog_pool):
        """清理超时对话"""
        timeout_dialogs = []
        
        for session_id, dialog in dialog_pool.items():
            if await self.check_timeout(dialog):
                timeout_dialogs.append(session_id)
                
                # 保存对话历史
                await self._save_dialog_history(dialog)
                
                # 发送超时通知
                await self._notify_timeout(session_id)
        
        # 清理超时对话
        for session_id in timeout_dialogs:
            del dialog_pool[session_id]
            
        return len(timeout_dialogs)

6.3 灰度发布方案

四阶段灰度发布流程：

内部测试阶段（5%流量）
- 只对内部员工开放
- 重点测试核心功能
- 收集性能基线数据
小流量灰度（10%流量）
- 选择友好用户群体
- 监控错误率和性能指标
- A/B测试关键功能
中流量灰度（50%流量）
- 扩大用户范围
- 验证系统稳定性
- 调整负载均衡策略
全量发布（100%流量）
- 监控所有业务指标
- 准备回滚方案
- 24小时值班监控

灰度发布配置示例：

deployment:
  strategy: canary
  stages:
    - name: internal
      traffic_percentage: 5
      duration: 2h
      metrics:
        - error_rate < 1%
        - p95_latency < 100ms
      
    - name: early_adopters
      traffic_percentage: 10
      duration: 6h
      user_segment: vip_users
      metrics:
        - error_rate < 0.5%
        - user_satisfaction > 4.5
      
    - name: broad_release
      traffic_percentage: 50
      duration: 12h
      metrics:
        - error_rate < 0.3%
        - system_load < 70%
      
    - name: full_release
      traffic_percentage: 100
      rollback_on:
        - error_rate > 1% for 5min
        - p99_latency > 500ms for 10min