GLM-4.7-Flash与SpringBoot集成实战:构建智能问答系统
GLM-4.7-Flash与SpringBoot集成实战:构建智能问答系统
1. 引言
智能问答系统已经成为企业提升客户服务效率和用户体验的重要工具。传统的人工客服成本高、响应慢,而基于大模型的智能系统能够7×24小时提供即时响应,大大提升了服务效率。
GLM-4.7-Flash作为30B级别的轻量级模型,在性能和效率之间找到了很好的平衡点。它支持200K的上下文长度,在代码生成和逻辑推理方面表现突出,特别适合构建企业级的智能问答系统。
本文将手把手教你如何将GLM-4.7-Flash集成到SpringBoot项目中,从环境搭建到API封装,从性能优化到异常处理,为你提供一套完整的解决方案。
2. 环境准备与模型部署
2.1 系统要求
在开始之前,确保你的系统满足以下要求:
- 内存:至少32GB RAM(推荐64GB)
- 显卡:NVIDIA GPU with 24GB+ VRAM(如RTX 4090)
- 操作系统:Linux/Windows/macOS
- Java环境:JDK 11或更高版本
2.2 使用Ollama部署GLM-4.7-Flash
Ollama是目前最简单的方式来运行GLM-4.7-Flash模型。首先安装Ollama:
# Linux/macOS安装
curl -fsSL https://ollama.ai/install.sh | sh
# Windows安装
# 下载并运行安装程序从官网
然后拉取并运行模型:
# 拉取模型
ollama pull glm-4.7-flash
# 运行模型
ollama run glm-4.7-flash
2.3 验证模型运行
运行成功后,你可以通过简单的对话来测试模型:
curl http://localhost:11434/api/chat \
-d '{
"model": "glm-4.7-flash",
"messages": [{"role": "user", "content": "你好,请介绍一下你自己"}]
}'
如果看到正常的响应,说明模型已经成功运行。
3. SpringBoot项目集成
3.1 创建SpringBoot项目
使用Spring Initializr创建一个新的SpringBoot项目:
# 使用curl创建项目
curl https://start.spring.io/starter.zip \
-d dependencies=web,webflux \
-d type=maven-project \
-d language=java \
-d bootVersion=3.2.0 \
-d baseDir=glm-qa-system \
-d groupId=com.example \
-d artifactId=glm-qa-system \
-o glm-qa-system.zip
# 解压并进入项目目录
unzip glm-qa-system.zip
cd glm-qa-system
3.2 添加必要的依赖
在pom.xml中添加WebClient和JSON处理相关的依赖:
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-webflux</artifactId>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
</dependency>
</dependencies>
3.3 配置Ollama连接
创建配置类来管理Ollama连接:
@Configuration
public class OllamaConfig {
@Value("${ollama.url:http://localhost:11434}")
private String ollamaUrl;
@Bean
public WebClient ollamaWebClient() {
return WebClient.builder()
.baseUrl(ollamaUrl)
.defaultHeader(HttpHeaders.CONTENT_TYPE, MediaType.APPLICATION_JSON_VALUE)
.build();
}
}
在application.properties中添加配置:
# Ollama配置
ollama.url=http://localhost:11434
ollama.model=glm-4.7-flash
# 超时配置
ollama.timeout.connect=5000
ollama.timeout.read=30000
4. 核心API封装
4.1 定义请求响应DTO
创建模型请求和响应的数据对象:
@Data
@AllArgsConstructor
@NoArgsConstructor
public class ChatMessage {
private String role;
private String content;
}
@Data
@AllArgsConstructor
@NoArgsConstructor
public class OllamaRequest {
private String model;
private List<ChatMessage> messages;
private Boolean stream = false;
private Map<String, Object> options;
}
@Data
@AllArgsConstructor
@NoArgsConstructor
public class OllamaResponse {
private String model;
private Instant createdAt;
private ChatMessage message;
private Boolean done;
private Long totalDuration;
}
4.2 实现问答服务
创建核心的问答服务类:
@Service
@Slf4j
public class QAService {
private final WebClient webClient;
private final String modelName;
public QAService(WebClient ollamaWebClient,
@Value("${ollama.model}") String modelName) {
this.webClient = ollamaWebClient;
this.modelName = modelName;
}
public Mono<String> askQuestion(String question) {
List<ChatMessage> messages = Arrays.asList(
new ChatMessage("user", question)
);
OllamaRequest request = new OllamaRequest(
modelName, messages, false, null
);
return webClient.post()
.uri("/api/chat")
.bodyValue(request)
.retrieve()
.bodyToMono(OllamaResponse.class)
.map(response -> response.getMessage().getContent())
.timeout(Duration.ofSeconds(30))
.doOnError(error -> log.error("问答服务异常", error));
}
public Flux<String> askQuestionStream(String question) {
List<ChatMessage> messages = Arrays.asList(
new ChatMessage("user", question)
);
OllamaRequest request = new OllamaRequest(
modelName, messages, true, null
);
return webClient.post()
.uri("/api/chat")
.bodyValue(request)
.retrieve()
.bodyToFlux(String.class)
.timeout(Duration.ofSeconds(30))
.doOnError(error -> log.error("流式问答异常", error));
}
}
4.3 创建REST控制器
提供对外的API接口:
@RestController
@RequestMapping("/api/qa")
@Slf4j
public class QAController {
private final QAService qaService;
public QAController(QAService qaService) {
this.qaService = qaService;
}
@PostMapping("/ask")
public Mono<ResponseEntity<Map<String, Object>>> askQuestion(
@RequestBody Map<String, String> request) {
String question = request.get("question");
if (StringUtils.isEmpty(question)) {
return Mono.just(ResponseEntity.badRequest()
.body(Map.of("error", "问题不能为空")));
}
return qaService.askQuestion(question)
.map(answer -> ResponseEntity.ok()
.body(Map.of("question", question, "answer", answer)))
.onErrorResume(error -> Mono.just(ResponseEntity.internalServerError()
.body(Map.of("error", "系统繁忙,请稍后重试"))));
}
@GetMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<String> askQuestionStream(@RequestParam String question) {
if (StringUtils.isEmpty(question)) {
return Flux.just("错误:问题不能为空");
}
return qaService.askQuestionStream(question)
.onErrorResume(error -> Flux.just("系统繁忙,请稍后重试"));
}
}
5. 性能优化与调优
5.1 连接池配置
优化WebClient的连接池配置:
@Configuration
public class WebClientConfig {
@Bean
public WebClient ollamaWebClient(WebClient.Builder builder) {
HttpClient httpClient = HttpClient.create()
.option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 5000)
.doOnConnected(conn ->
conn.addHandlerLast(new ReadTimeoutHandler(30))
.addHandlerLast(new WriteTimeoutHandler(30)));
return builder.baseUrl("http://localhost:11434")
.clientConnector(new ReactorClientHttpConnector(httpClient))
.build();
}
}
5.2 请求批处理
对于高并发场景,实现请求批处理:
@Service
public class BatchQAService {
private final QAService qaService;
private final Scheduler scheduler;
public BatchQAService(QAService qaService) {
this.qaService = qaService;
this.scheduler = Schedulers.boundedElastic();
}
public Flux<Map<String, Object>> batchAskQuestions(List<String> questions) {
return Flux.fromIterable(questions)
.parallel()
.runOn(scheduler)
.flatMap(question -> qaService.askQuestion(question)
.map(answer -> Map.of("question", question, "answer", answer))
.onErrorResume(error -> Mono.just(Map.of(
"question", question,
"error", "处理失败: " + error.getMessage()
)))
)
.sequential();
}
}
5.3 缓存策略
添加Redis缓存来减少重复请求:
@Service
@Slf4j
public class CachedQAService {
private final QAService qaService;
private final RedisTemplate<String, String> redisTemplate;
public CachedQAService(QAService qaService,
RedisTemplate<String, String> redisTemplate) {
this.qaService = qaService;
this.redisTemplate = redisTemplate;
}
public Mono<String> askQuestionWithCache(String question) {
String cacheKey = "qa:" + DigestUtils.md5DigestAsHex(question.getBytes());
// 先尝试从缓存获取
String cachedAnswer = redisTemplate.opsForValue().get(cacheKey);
if (cachedAnswer != null) {
log.info("从缓存获取答案");
return Mono.just(cachedAnswer);
}
// 缓存中没有,调用模型
return qaService.askQuestion(question)
.flatMap(answer -> {
// 将结果缓存1小时
redisTemplate.opsForValue().set(cacheKey, answer, 1, TimeUnit.HOURS);
return Mono.just(answer);
});
}
}
6. 异常处理与容错
6.1 全局异常处理
创建全局异常处理器:
@RestControllerAdvice
@Slf4j
public class GlobalExceptionHandler {
@ExceptionHandler(WebClientResponseException.class)
public ResponseEntity<Map<String, Object>> handleWebClientException(
WebClientResponseException ex) {
log.error("Ollama服务调用异常", ex);
return ResponseEntity.status(HttpStatus.SERVICE_UNAVAILABLE)
.body(Map.of("error", "智能问答服务暂时不可用", "code", "SERVICE_UNAVAILABLE"));
}
@ExceptionHandler(TimeoutException.class)
public ResponseEntity<Map<String, Object>> handleTimeoutException(
TimeoutException ex) {
log.error("请求超时", ex);
return ResponseEntity.status(HttpStatus.REQUEST_TIMEOUT)
.body(Map.of("error", "请求超时,请稍后重试", "code", "TIMEOUT"));
}
@ExceptionHandler(Exception.class)
public ResponseEntity<Map<String, Object>> handleGeneralException(Exception ex) {
log.error("系统异常", ex);
return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
.body(Map.of("error", "系统内部错误", "code", "INTERNAL_ERROR"));
}
}
6.2 重试机制
为问答服务添加重试机制:
@Service
@Slf4j
public class RetryQAService {
private final QAService qaService;
public RetryQAService(QAService qaService) {
this.qaService = qaService;
}
public Mono<String> askQuestionWithRetry(String question) {
return qaService.askQuestion(question)
.retryWhen(Retry.backoff(3, Duration.ofSeconds(1))
.onErrorResume(error -> {
log.warn("经过重试后仍然失败", error);
return Mono.just("抱歉,系统暂时无法处理您的请求,请稍后重试");
});
}
}
6.3 服务健康检查
添加健康检查端点:
@Component
public class OllamaHealthIndicator implements HealthIndicator {
private final WebClient webClient;
public OllamaHealthIndicator(WebClient webClient) {
this.webClient = webClient;
}
@Override
public Health health() {
try {
webClient.get()
.uri("/api/tags")
.retrieve()
.bodyToMono(String.class)
.timeout(Duration.ofSeconds(3))
.block();
return Health.up().build();
} catch (Exception e) {
return Health.down()
.withDetail("error", e.getMessage())
.build();
}
}
}
7. 实际应用场景
7.1 客服问答系统
将智能问答系统集成到客服平台:
@Service
@Slf4j
public class CustomerService {
private final QAService qaService;
private final Map<String, String> contextMap = new ConcurrentHashMap<>();
public CustomerService(QAService qaService) {
this.qaService = qaService;
}
public Mono<String> handleCustomerQuery(String sessionId, String query) {
// 获取对话上下文
String context = contextMap.getOrDefault(sessionId, "");
String fullQuery = context + "\n用户提问: " + query;
return qaService.askQuestion(fullQuery)
.map(answer -> {
// 更新对话上下文
String newContext = context + "\n用户: " + query + "\n助手: " + answer;
if (newContext.length() > 1000) {
newContext = newContext.substring(newContext.length() - 1000);
}
contextMap.put(sessionId, newContext);
return answer;
});
}
}
7.2 知识库问答
集成企业知识库:
@Service
@Slf4j
public class KnowledgeBaseService {
private final QAService qaService;
private final List<String> knowledgeBase;
public KnowledgeBaseService(QAService qaService) {
this.qaService = qaService;
this.knowledgeBase = loadKnowledgeBase();
}
public Mono<String> queryKnowledgeBase(String question) {
// 先尝试从知识库中匹配相关问题
Optional<String> matchedQuestion = knowledgeBase.stream()
.filter(kb -> similarity(kb, question) > 0.6)
.findFirst();
if (matchedQuestion.isPresent()) {
return Mono.just("根据知识库: " + matchedQuestion.get());
}
// 知识库中没有,使用模型回答
String enhancedQuestion = "请基于以下知识回答问题:\n" +
String.join("\n", knowledgeBase) +
"\n问题: " + question;
return qaService.askQuestion(enhancedQuestion);
}
private double similarity(String str1, String str2) {
// 简单的相似度计算实现
Set<String> words1 = new HashSet<>(Arrays.asList(str1.toLowerCase().split(" ")));
Set<String> words2 = new HashSet<>(Arrays.asList(str2.toLowerCase().split(" ")));
Set<String> intersection = new HashSet<>(words1);
intersection.retainAll(words2);
Set<String> union = new HashSet<>(words1);
union.addAll(words2);
return union.isEmpty() ? 0 : (double) intersection.size() / union.size();
}
private List<String> loadKnowledgeBase() {
// 从文件或数据库加载知识库
return Arrays.asList(
"产品A的价格是100元",
"产品B支持7天无理由退货",
"客服工作时间是9:00-18:00"
);
}
}
8. 总结
通过本文的实践,我们成功将GLM-4.7-Flash集成到了SpringBoot项目中,构建了一个功能完整的智能问答系统。从模型部署、API封装到性能优化和异常处理,每个环节都提供了具体的实现方案。
实际使用下来,GLM-4.7-Flash在问答场景中的表现令人满意,响应速度快且答案质量较高。SpringBoot的响应式编程模型与Ollama的API配合得很好,能够支持较高的并发请求。
在部署过程中,需要注意模型的内存需求和网络延迟问题。建议在生产环境中使用GPU服务器来获得更好的性能,同时配置合适的超时时间和重试策略来保证系统的稳定性。
这个智能问答系统可以进一步扩展,比如加入用户反馈机制、多轮对话管理、答案质量评估等功能,让系统更加智能和实用。如果你有特定的业务场景需求,也可以针对性地对提示词和后续处理逻辑进行调整。
获取更多AI镜像
想探索更多AI镜像和应用场景?访问 CSDN星图镜像广场,提供丰富的预置镜像,覆盖大模型推理、图像生成、视频生成、模型微调等多个领域,支持一键部署。
更多推荐



所有评论(0)