SpringAI智能客服实战：从零搭建高可用对话系统架构

经过一个多月的开发、测试和优化，这套基于SpringAI的智能客服系统终于成功上线了。目前系统每天处理超过10万次对话，平均响应时间在800毫秒以内，错误率低于0.1%。最让我欣慰的是，用户满意度从原来的65%提升到了85%。回顾整个项目，有几个关键点值得分享：首先是架构设计要超前，一开始就要考虑高并发和可扩展性；其次是降级策略要完善，AI服务不稳定是常态，必须有备用方案；最后是监控要全面，从应用

键盘侠 er

378人浏览 · 2026-03-24 03:33:42

键盘侠 er · 2026-03-24 03:33:42 发布

SpringAI智能客服实战：从零搭建高可用对话系统架构

最近接手了一个老项目的客服模块升级，真是让我头疼不已。原来的系统是基于规则引擎的，每次业务变动都要改一堆if-else，响应速度慢不说，高峰期还经常挂掉。用户投诉最多的就是“机器人听不懂人话”和“排队等半天”。这让我下定决心，要用现在最火的AI技术来重构整个客服系统。

经过一番调研，我选择了SpringAI作为技术栈。你可能听说过Rasa或者DialogFlow，它们确实不错，但对我们Java技术栈为主的团队来说，SpringAI有几个明显的优势：首先是无缝集成，Spring Boot项目几乎零成本接入；其次是生态统一，能用熟悉的Spring方式管理配置、处理异常；还有就是灵活性高，底层可以随时切换不同的AI模型提供商，不会被某一家绑定。

下面我就分享一下整个搭建过程，从架构设计到代码实现，再到性能调优，希望能帮你少走弯路。

1. 高并发架构设计：WebFlux响应式编程

传统客服系统最大的瓶颈就是并发处理能力。当大量用户同时咨询时，线程阻塞会导致系统响应急剧下降。我选择了Spring WebFlux作为Web层框架，它基于Reactor实现响应式编程，用少量线程就能处理大量并发连接。

具体实现时，我设计了这样的架构：

网关层：使用Spring Cloud Gateway作为统一入口，负责请求路由、限流和初步鉴权。这里配置了每秒1000个请求的限流，防止突发流量打垮后端服务。
业务层：核心的SpringAI智能客服服务，采用WebFlux处理请求。关键是要把AI API调用也做成非阻塞的，否则响应式编程的优势就发挥不出来了。
缓存层：使用Redis集群存储对话上下文和用户会话状态。这里有个细节要注意，对话上下文需要设置合理的TTL，我一般设为30分钟，既保证多轮对话的连贯性，又避免内存无限增长。
存储层：MySQL存储知识库和对话日志，Elasticsearch用于知识检索。对于客服场景，快速检索相关知识条目至关重要。

2. 核心实现：三驾马车驱动智能对话

2.1 WebFlux处理高并发请求

先来看看Controller层的实现。我创建了一个ChatController，使用@RestController注解，但方法返回值都是Mono或Flux：

@RestController
@RequestMapping("/api/v1/chat")
@Slf4j
public class ChatController {
    
    private final ChatService chatService;
    private final AuthService authService;
    
    // 构造函数注入
    public ChatController(ChatService chatService, AuthService authService) {
        this.chatService = chatService;
        this.authService = authService;
    }
    
    @PostMapping("/message")
    public Mono<ApiResponse<ChatResponse>> handleMessage(
            @RequestHeader("Authorization") String token,
            @RequestBody ChatRequest request) {
        
        return authService.validateToken(token)
                .flatMap(userId -> {
                    // 记录请求日志
                    log.info("用户{}发送消息: {}", userId, request.getMessage());
                    
                    // 处理消息并返回响应
                    return chatService.processMessage(userId, request)
                            .map(response -> ApiResponse.success(response))
                            .onErrorResume(e -> {
                                log.error("处理消息失败", e);
                                return Mono.just(ApiResponse.error("系统繁忙，请稍后重试"));
                            });
                })
                .switchIfEmpty(Mono.just(ApiResponse.error("认证失败")));
    }
    
    // 流式响应接口，适合长对话
    @GetMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
    public Flux<ServerSentEvent<String>> streamChat(
            @RequestHeader("Authorization") String token,
            @RequestParam String message) {
        
        return authService.validateToken(token)
                .flatMapMany(userId -> 
                    chatService.streamProcess(userId, message)
                            .map(content -> ServerSentEvent.builder(content).build())
                );
    }
}

2.2 对话状态机维护上下文

多轮对话的核心是维护上下文。我设计了一个简单的状态机来管理对话状态：

@Component
@Slf4j
public class DialogueStateManager {
    
    private final RedisTemplate<String, DialogueContext> redisTemplate;
    
    // 对话上下文类
    @Data
    @AllArgsConstructor
    @NoArgsConstructor
    public static class DialogueContext {
        private String sessionId;
        private String userId;
        private List<Message> history;
        private DialogueState state;
        private LocalDateTime lastActiveTime;
        private Map<String, Object> slots; // 用于存储提取的实体信息
        
        public enum DialogueState {
            GREETING,      // 问候阶段
            IDENTIFYING,   // 识别意图
            COLLECTING,    // 收集信息
            PROCESSING,    // 处理中
            COMPLETED,     // 完成
            TRANSFER       // 转人工
        }
    }
    
    @Data
    @AllArgsConstructor
    @NoArgsConstructor
    public static class Message {
        private String role; // user 或 assistant
        private String content;
        private LocalDateTime timestamp;
    }
    
    // 获取或创建对话上下文
    public Mono<DialogueContext> getOrCreateContext(String userId, String sessionId) {
        String key = buildKey(userId, sessionId);
        
        return Mono.fromCallable(() -> redisTemplate.opsForValue().get(key))
                .subscribeOn(Schedulers.boundedElastic())
                .flatMap(context -> {
                    if (context == null) {
                        // 创建新的对话上下文
                        DialogueContext newContext = new DialogueContext();
                        newContext.setSessionId(sessionId);
                        newContext.setUserId(userId);
                        newContext.setHistory(new ArrayList<>());
                        newContext.setState(DialogueState.GREETING);
                        newContext.setLastActiveTime(LocalDateTime.now());
                        newContext.setSlots(new HashMap<>());
                        
                        return saveContext(newContext)
                                .thenReturn(newContext);
                    }
                    
                    // 更新最后活跃时间
                    context.setLastActiveTime(LocalDateTime.now());
                    return saveContext(context).thenReturn(context);
                });
    }
    
    // 添加消息到历史记录
    public Mono<Void> addMessageToHistory(DialogueContext context, Message message) {
        context.getHistory().add(message);
        
        // 限制历史记录长度，避免token超限
        if (context.getHistory().size() > 20) {
            context.setHistory(context.getHistory().subList(
                context.getHistory().size() - 10, 
                context.getHistory().size()
            ));
        }
        
        return saveContext(context);
    }
    
    // 更新对话状态
    public Mono<Void> updateState(DialogueContext context, DialogueState newState) {
        context.setState(newState);
        return saveContext(context);
    }
    
    // 保存上下文到Redis
    private Mono<Void> saveContext(DialogueContext context) {
        return Mono.fromRunnable(() -> {
            String key = buildKey(context.getUserId(), context.getSessionId());
            redisTemplate.opsForValue().set(key, context, 30, TimeUnit.MINUTES);
        }).subscribeOn(Schedulers.boundedElastic()).then();
    }
    
    private String buildKey(String userId, String sessionId) {
        return String.format("dialogue:ctx:%s:%s", userId, sessionId);
    }
}

2.3 AI API集成与降级策略

集成OpenAI API时，稳定性是关键。我实现了多层降级策略：

@Service
@Slf4j
public class AIServiceImpl implements AIService {
    
    private final OpenAiChatClient openAiClient;
    private final OpenAiChatClient backupClient; // 备用API端点
    private final RuleBasedFallbackService fallbackService;
    
    // 主AI服务调用
    @Override
    public Mono<String> generateResponse(String prompt, List<Message> history) {
        // 构建完整的prompt
        String fullPrompt = buildPromptWithHistory(prompt, history);
        
        // 尝试主服务，超时时间设为10秒
        return callPrimaryAI(fullPrompt)
                .timeout(Duration.ofSeconds(10))
                .onErrorResume(PrimaryTimeoutException.class, e -> {
                    log.warn("主AI服务超时，尝试备用服务");
                    return callBackupAI(fullPrompt);
                })
                .onErrorResume(BackupTimeoutException.class, e -> {
                    log.warn("备用AI服务也失败，降级到规则引擎");
                    return fallbackService.getResponse(prompt);
                })
                .doOnError(e -> log.error("所有AI服务都失败", e));
    }
    
    // 调用主AI服务
    private Mono<String> callPrimaryAI(String prompt) {
        return Mono.fromCallable(() -> {
            Prompt aiPrompt = new Prompt(new UserMessage(prompt));
            ChatResponse response = openAiClient.call(aiPrompt);
            return response.getResult().getOutput().getContent();
        }).subscribeOn(Schedulers.boundedElastic());
    }
    
    // 构建包含历史记录的prompt
    private String buildPromptWithHistory(String currentPrompt, List<Message> history) {
        StringBuilder builder = new StringBuilder();
        
        // 添加系统提示
        builder.append("你是一个专业的客服助手。请根据对话历史回答用户问题。\n\n");
        
        // 添加历史对话
        if (history != null && !history.isEmpty()) {
            builder.append("对话历史：\n");
            for (Message msg : history) {
                builder.append(msg.getRole()).append(": ").append(msg.getContent()).append("\n");
            }
            builder.append("\n");
        }
        
        // 添加当前问题
        builder.append("用户最新问题：").append(currentPrompt);
        builder.append("\n\n请给出专业、友好的回答：");
        
        return builder.toString();
    }
}

3. 安全与可观测性：JWT鉴权与日志埋点

3.1 JWT鉴权实现

安全是生产系统的生命线。我使用JWT进行无状态认证：

@Component
@Slf4j
public class JwtTokenProvider {
    
    @Value("${jwt.secret}")
    private String jwtSecret;
    
    @Value("${jwt.expiration}")
    private long jwtExpiration;
    
    // 生成Token
    public String generateToken(String userId, String username) {
        Date now = new Date();
        Date expiryDate = new Date(now.getTime() + jwtExpiration);
        
        return Jwts.builder()
                .setSubject(userId)
                .claim("username", username)
                .setIssuedAt(now)
                .setExpiration(expiryDate)
                .signWith(SignatureAlgorithm.HS512, jwtSecret)
                .compact();
    }
    
    // 验证Token
    public Mono<String> validateToken(String token) {
        return Mono.fromCallable(() -> {
            try {
                Claims claims = Jwts.parser()
                        .setSigningKey(jwtSecret)
                        .parseClaimsJws(token)
                        .getBody();
                
                String userId = claims.getSubject();
                Date expiration = claims.getExpiration();
                
                if (expiration.before(new Date())) {
                    throw new ExpiredJwtException(null, claims, "Token已过期");
                }
                
                return userId;
            } catch (ExpiredJwtException ex) {
                log.warn("Token过期: {}", ex.getMessage());
                throw new AuthenticationException("Token已过期，请重新登录");
            } catch (JwtException | IllegalArgumentException ex) {
                log.warn("无效的Token: {}", ex.getMessage());
                throw new AuthenticationException("无效的Token");
            }
        }).subscribeOn(Schedulers.boundedElastic());
    }
}

// 全局过滤器
@Component
public class JwtAuthenticationFilter implements WebFilter {
    
    private final JwtTokenProvider tokenProvider;
    private final List<String> excludedPaths = Arrays.asList("/api/auth/login", "/api/auth/register");
    
    public JwtAuthenticationFilter(JwtTokenProvider tokenProvider) {
        this.tokenProvider = tokenProvider;
    }
    
    @Override
    public Mono<Void> filter(ServerWebExchange exchange, WebFilterChain chain) {
        String path = exchange.getRequest().getPath().value();
        
        // 排除认证接口
        if (excludedPaths.stream().anyMatch(path::startsWith)) {
            return chain.filter(exchange);
        }
        
        // 获取Token
        String token = resolveToken(exchange.getRequest());
        
        if (token == null) {
            exchange.getResponse().setStatusCode(HttpStatus.UNAUTHORIZED);
            return exchange.getResponse().setComplete();
        }
        
        // 验证Token
        return tokenProvider.validateToken(token)
                .flatMap(userId -> {
                    // 将用户ID添加到请求属性中
                    exchange.getAttributes().put("userId", userId);
                    return chain.filter(exchange);
                })
                .onErrorResume(e -> {
                    exchange.getResponse().setStatusCode(HttpStatus.UNAUTHORIZED);
                    return exchange.getResponse().writeWith(
                        Mono.just(exchange.getResponse()
                            .bufferFactory()
                            .wrap("认证失败".getBytes()))
                    );
                });
    }
    
    private String resolveToken(ServerHttpRequest request) {
        String bearerToken = request.getHeaders().getFirst("Authorization");
        if (bearerToken != null && bearerToken.startsWith("Bearer ")) {
            return bearerToken.substring(7);
        }
        return null;
    }
}

3.2 对话日志埋点

为了后续分析和优化，完善的日志埋点必不可少：

@Aspect
@Component
@Slf4j
public class ChatLogAspect {
    
    @Around("@annotation(com.example.annotation.ChatLog)")
    public Object logChat(ProceedingJoinPoint joinPoint) throws Throwable {
        long startTime = System.currentTimeMillis();
        String methodName = joinPoint.getSignature().getName();
        
        // 获取请求参数
        Object[] args = joinPoint.getArgs();
        String userId = extractUserId(args);
        String message = extractMessage(args);
        
        // 记录请求日志
        log.info("Chat请求开始 | 用户: {} | 方法: {} | 消息: {}", 
                 userId, methodName, maskSensitiveInfo(message));
        
        try {
            // 执行原方法
            Object result = joinPoint.proceed();
            long endTime = System.currentTimeMillis();
            
            // 记录响应日志
            log.info("Chat请求完成 | 用户: {} | 方法: {} | 耗时: {}ms | 状态: 成功",
                     userId, methodName, endTime - startTime);
            
            // 异步保存详细日志到数据库
            saveChatLogAsync(userId, message, result, endTime - startTime, true);
            
            return result;
            
        } catch (Exception e) {
            long endTime = System.currentTimeMillis();
            
            log.error("Chat请求失败 | 用户: {} | 方法: {} | 耗时: {}ms | 错误: {}",
                      userId, methodName, endTime - startTime, e.getMessage());
            
            saveChatLogAsync(userId, message, null, endTime - startTime, false);
            
            throw e;
        }
    }
    
    // 异步保存日志到数据库
    @Async
    public void saveChatLogAsync(String userId, String request, Object response, 
                                 long duration, boolean success) {
        try {
            ChatLogEntity logEntity = new ChatLogEntity();
            logEntity.setUserId(userId);
            logEntity.setRequest(request);
            logEntity.setResponse(response != null ? response.toString() : null);
            logEntity.setDuration(duration);
            logEntity.setSuccess(success);
            logEntity.setCreateTime(LocalDateTime.now());
            
            // 这里调用Repository保存到数据库
            // chatLogRepository.save(logEntity);
            
        } catch (Exception e) {
            log.error("保存聊天日志失败", e);
        }
    }
    
    // 脱敏处理
    private String maskSensitiveInfo(String text) {
        if (text == null) return "";
        
        // 简单的手机号、邮箱脱敏
        String masked = text
            .replaceAll("1[3-9]\\d{9}", "****")
            .replaceAll("\\w+@\\w+\\.\\w+", "***@***.***");
        
        return masked.length() > 100 ? masked.substring(0, 100) + "..." : masked;
    }
    
    private String extractUserId(Object[] args) {
        // 根据实际参数结构提取用户ID
        return "unknown";
    }
    
    private String extractMessage(Object[] args) {
        // 根据实际参数结构提取消息
        return Arrays.toString(args);
    }
}

4. 性能测试与优化

4.1 JMeter压测配置

上线前必须进行充分的压力测试。我使用JMeter模拟高并发场景：

// 对应的JMeter测试计划配置建议：
/*
1. 线程组配置：
   - 线程数：500
   - Ramp-Up时间：60秒
   - 循环次数：永远

2. HTTP请求默认值：
   - 协议：https
   - 服务器名称：your-api-server.com
   - 端口：443

3. HTTP请求：
   - 路径：/api/v1/chat/message
   - 方法：POST
   - Body Data：
     {
       "message": "我想查询订单状态",
       "sessionId": "${__RandomString(10,abcdefghijklmnopqrstuvwxyz)}"
     }

4. HTTP信息头管理器：
   - Authorization: Bearer ${token}
   - Content-Type: application/json

5. 断言：
   - 响应代码：200
   - 响应时间：小于2000ms

6. 监听器：
   - 聚合报告
   - 响应时间图
   - 每秒事务数
*/

4.2 超时与重试机制

网络调用必须要有超时和重试机制：

@Configuration
public class ResilienceConfig {
    
    @Bean
    public CircuitBreakerConfig circuitBreakerConfig() {
        return CircuitBreakerConfig.custom()
                .failureRateThreshold(50) // 失败率阈值
                .waitDurationInOpenState(Duration.ofSeconds(30)) // 半开状态等待时间
                .permittedNumberOfCallsInHalfOpenState(10) // 半开状态允许的调用数
                .slidingWindowType(CircuitBreakerConfig.SlidingWindowType.COUNT_BASED)
                .slidingWindowSize(100) // 滑动窗口大小
                .build();
    }
    
    @Bean
    public RetryConfig retryConfig() {
        return RetryConfig.custom()
                .maxAttempts(3) // 最大重试次数
                .waitDuration(Duration.ofMillis(500)) // 重试间隔
                .retryOnException(e -> e instanceof TimeoutException || 
                                      e instanceof IOException)
                .build();
    }
    
    @Bean
    public BulkheadConfig bulkheadConfig() {
        return BulkheadConfig.custom()
                .maxConcurrentCalls(100) // 最大并发调用数
                .maxWaitDuration(Duration.ofMillis(500)) // 最大等待时间
                .build();
    }
}

// 使用Resilience4j包装AI调用
@Service
public class ResilientAIService {
    
    private final AIService aiService;
    private final CircuitBreaker circuitBreaker;
    private final Retry retry;
    private final Bulkhead bulkhead;
    
    public ResilientAIService(AIService aiService, 
                             CircuitBreakerRegistry circuitBreakerRegistry,
                             RetryRegistry retryRegistry,
                             BulkheadRegistry bulkheadRegistry) {
        this.aiService = aiService;
        this.circuitBreaker = circuitBreakerRegistry.circuitBreaker("aiService");
        this.retry = retryRegistry.retry("aiService");
        this.bulkhead = bulkheadRegistry.bulkhead("aiService");
    }
    
    public Mono<String> callAIWithResilience(String prompt, List<Message> history) {
        Supplier<Mono<String>> supplier = () -> aiService.generateResponse(prompt, history);
        
        // 组合使用断路器、重试和舱壁
        Supplier<Mono<String>> decoratedSupplier = Decorators.ofSupplier(supplier)
                .withCircuitBreaker(circuitBreaker)
                .withRetry(retry)
                .withBulkhead(bulkhead)
                .decorate();
        
        return Mono.fromSupplier(decoratedSupplier)
                .timeout(Duration.ofSeconds(15))
                .onErrorResume(e -> {
                    log.error("AI服务调用失败，启用降级", e);
                    return Mono.just("抱歉，系统暂时繁忙，请稍后再试或联系人工客服。");
                });
    }
}

5. 避坑指南：生产环境注意事项

5.1 敏感词过滤方案

AI生成的内容不可控，必须要有敏感词过滤：

@Component
public class ContentFilter {
    
    private final Set<String> sensitiveWords;
    private final AhoCorasickDoubleArrayTrie<String> trie;
    
    public ContentFilter() {
        // 初始化敏感词库
        sensitiveWords = loadSensitiveWords();
        
        // 构建AC自动机
        trie = new AhoCorasickDoubleArrayTrie<>();
        Map<String, String> map = sensitiveWords.stream()
                .collect(Collectors.toMap(word -> word, word -> "***"));
        trie.build(map);
    }
    
    // 过滤敏感词
    public String filter(String text) {
        if (text == null || text.isEmpty()) {
            return text;
        }
        
        List<AhoCorasickDoubleArrayTrie.Hit<String>> hits = trie.parseText(text);
        
        if (hits.isEmpty()) {
            return text;
        }
        
        // 替换敏感词
        StringBuilder result = new StringBuilder(text);
        for (AhoCorasickDoubleArrayTrie.Hit<String> hit : hits) {
            for (int i = hit.begin; i < hit.end; i++) {
                result.setCharAt(i, '*');
            }
        }
        
        return result.toString();
    }
    
    // 检查是否包含敏感词
    public boolean containsSensitiveWord(String text) {
        if (text == null || text.isEmpty()) {
            return false;
        }
        return !trie.parseText(text).isEmpty();
    }
    
    // AI回复安全检查
    public Mono<String> safeAIResponse(String response) {
        return Mono.fromCallable(() -> {
            if (containsSensitiveWord(response)) {
                log.warn("AI回复包含敏感词，已过滤");
                return filter(response);
            }
            return response;
        }).subscribeOn(Schedulers.boundedElastic());
    }
    
    private Set<String> loadSensitiveWords() {
        // 从文件或数据库加载敏感词
        Set<String> words = new HashSet<>();
        // 这里可以读取配置文件或数据库
        words.add("敏感词1");
        words.add("敏感词2");
        // ...
        return words;
    }
}

5.2 会话超时与内存泄漏预防

长时间运行的会话可能造成内存泄漏，需要定期清理：

@Component
@Slf4j
public class SessionCleanupScheduler {
    
    private final RedisTemplate<String, Object> redisTemplate;
    
    @Scheduled(fixedDelay = 300000) // 每5分钟执行一次
    public void cleanupExpiredSessions() {
        log.info("开始清理过期会话...");
        
        long startTime = System.currentTimeMillis();
        int cleanedCount = 0;
        
        try {
            // 查找所有会话key
            Set<String> sessionKeys = redisTemplate.keys("dialogue:ctx:*");
            
            if (sessionKeys != null) {
                for (String key : sessionKeys) {
                    DialogueContext context = (DialogueContext) redisTemplate.opsForValue().get(key);
                    
                    if (context != null) {
                        // 检查最后活跃时间，超过1小时未活跃的会话
                        Duration duration = Duration.between(
                            context.getLastActiveTime(), 
                            LocalDateTime.now()
                        );
                        
                        if (duration.toMinutes() > 60) {
                            redisTemplate.delete(key);
                            cleanedCount++;
                            
                            // 记录清理日志
                            log.debug("清理过期会话: {}，用户: {}，最后活跃: {}", 
                                     context.getSessionId(), 
                                     context.getUserId(),
                                     context.getLastActiveTime());
                        }
                    }
                }
            }
            
            long endTime = System.currentTimeMillis();
            log.info("会话清理完成，共清理{}个会话，耗时{}ms", 
                     cleanedCount, endTime - startTime);
            
        } catch (Exception e) {
            log.error("清理会话时发生错误", e);
        }
    }
    
    // 监控内存使用情况
    @Scheduled(fixedDelay = 60000) // 每1分钟执行一次
    public void monitorMemoryUsage() {
        Runtime runtime = Runtime.getRuntime();
        long totalMemory = runtime.totalMemory();
        long freeMemory = runtime.freeMemory();
        long usedMemory = totalMemory - freeMemory;
        long maxMemory = runtime.maxMemory();
        
        double usagePercentage = (double) usedMemory / maxMemory * 100;
        
        log.info("内存使用情况 - 已用: {}MB, 空闲: {}MB, 总量: {}MB, 最大: {}MB, 使用率: {:.2f}%",
                 usedMemory / 1024 / 1024,
                 freeMemory / 1024 / 1024,
                 totalMemory / 1024 / 1024,
                 maxMemory / 1024 / 1024,
                 usagePercentage);
        
        // 如果内存使用率超过80%，触发告警
        if (usagePercentage > 80) {
            log.warn("内存使用率过高，当前使用率: {:.2f}%", usagePercentage);
            // 这里可以发送告警通知
        }
    }
}

6. 部署与监控

6.1 Docker容器化部署

# Dockerfile
FROM openjdk:17-jdk-slim

WORKDIR /app

# 复制构建产物
COPY target/springai-chatbot-*.jar app.jar

# 设置JVM参数
ENV JAVA_OPTS="-Xms512m -Xmx1024m -XX:+UseG1GC -XX:MaxGCPauseMillis=200"

# 健康检查
HEALTHCHECK --interval=30s --timeout=3s --start-period=60s --retries=3 \
  CMD curl -f http://localhost:8080/actuator/health || exit 1

EXPOSE 8080

ENTRYPOINT ["sh", "-c", "java $JAVA_OPTS -jar app.jar"]

6.2 Prometheus监控配置

# application.yml 监控配置
management:
  endpoints:
    web:
      exposure:
        include: health,info,metrics,prometheus
  metrics:
    export:
      prometheus:
        enabled: true
    distribution:
      percentiles-histogram:
        http.server.requests: true
  endpoint:
    health:
      show-details: always

# 自定义指标
@Configuration
public class MetricsConfig {
    
    @Bean
    public MeterRegistryCustomizer<MeterRegistry> metricsCommonTags() {
        return registry -> registry.config().commonTags(
            "application", "springai-chatbot",
            "environment", System.getenv().getOrDefault("ENV", "dev")
        );
    }
}

// 业务指标收集
@Component
public class ChatMetrics {
    
    private final MeterRegistry meterRegistry;
    private final Counter requestCounter;
    private final Timer responseTimer;
    private final DistributionSummary responseSize;
    
    public ChatMetrics(MeterRegistry meterRegistry) {
        this.meterRegistry = meterRegistry;
        
        // 请求计数器
        this.requestCounter = Counter.builder("chat.requests.total")
                .description("Total number of chat requests")
                .tag("type", "chat")
                .register(meterRegistry);
        
        // 响应时间计时器
        this.responseTimer = Timer.builder("chat.response.time")
                .description("Time taken to process chat requests")
                .register(meterRegistry);
        
        // 响应大小分布
        this.responseSize = DistributionSummary.builder("chat.response.size")
                .description("Size of chat responses in characters")
                .baseUnit("chars")
                .register(meterRegistry);
    }
    
    public void recordRequest() {
        requestCounter.increment();
    }
    
    public void recordResponseTime(long durationMillis) {
        responseTimer.record(durationMillis, TimeUnit.MILLISECONDS);
    }
    
    public void recordResponseSize(int size) {
        responseSize.record(size);
    }
}

总结与展望

经过一个多月的开发、测试和优化，这套基于SpringAI的智能客服系统终于成功上线了。目前系统每天处理超过10万次对话，平均响应时间在800毫秒以内，错误率低于0.1%。最让我欣慰的是，用户满意度从原来的65%提升到了85%。

回顾整个项目，有几个关键点值得分享：首先是架构设计要超前，一开始就要考虑高并发和可扩展性；其次是降级策略要完善，AI服务不稳定是常态，必须有备用方案；最后是监控要全面，从应用性能到业务指标都要覆盖。

现在系统运行稳定，但我已经在思考下一步的优化方向。随着多模态AI的发展，纯文本的客服系统已经不够用了。用户可能希望上传图片、语音甚至视频来咨询问题。比如用户拍一张产品故障的照片，系统就能识别问题并给出解决方案；或者用户用语音描述问题，系统能理解并回复。

这就引出了一个开放性问题：如何设计支持多模态输入的客服系统？ 是继续用SpringAI扩展多模态能力，还是引入专门的视觉、语音处理服务？多模态数据的存储、检索和上下文管理又该如何设计？这些都是在下一代智能客服系统中需要深入思考的问题。

技术总是在不断进步，作为开发者，我们要做的就是持续学习、不断优化，用更好的技术解决实际问题。希望我的这些经验对你有所帮助，也欢迎大家一起探讨智能客服系统的更多可能性。

AI Agent技术社区

Agent 垂直技术社区，欢迎活跃、内容共建。

更多推荐

从Anthropic官方文档看Claude的安全机制：隔离、模型与外部内容的三层防御体系

十二个月前，如果有人提议让Claude拥有足以搞垮Anthropic内部服务的权限，我们一定会断然拒绝。而今天，这种访问级别已经成为常态，Anthropic内部的开发者们正因为这种部署而大幅提升了生产力。这是我读完Anthropic官方工程博客《How we contain Claude across products》（2026年5月25日发布）后的第一感受。当AI Agent的能力越强大，它的