使用LightRAG构建基于ollama deepseek-r1:32b 本地大模型的知识检索系统

在知识密集型应用场景中，如何快速从海量文本中提取关键信息是关键挑战。库结合本地大模型（通过OLLAMA部署），实现一个轻量级的知识检索与生成系统。

东方金木

1897人浏览 · 2025-03-24 16:51:42

东方金木 · 2025-03-24 16:51:42 发布

一、项目背景与技术栈

在知识密集型应用场景中，如何快速从海量文本中提取关键信息是关键挑战。本文将展示如何利用 LightRAG 库结合本地大模型（通过OLLAMA部署），实现一个轻量级的知识检索与生成系统。

技术栈：

LightRAG：轻量级RAG框架，支持多模式检索
OLLAMA：本地大模型推理引擎（支持DeepSeek-R1等模型）
Python：开发语言

二、代码解析与实现步骤

1. 环境准备

import asyncio
import os
from lightrag import LightRAG, QueryParam
from lightrag.kg.shared_storage import initialize_pipeline_status
from lightrag.llm.ollama import ollama_model_complete, ollama_embedding
from lightrag.utils import EmbeddingFunc

关键库说明：
- lightrag：核心框架，实现RAG（检索增强生成）流程
- ollama：本地大模型调用接口
- asyncio：异步处理模型推理

2. 初始化RAG实例

async def initialize_rag():
    WORKING_DIR = "./dickens"
    
    if not os.path.exists(WORKING_DIR):
        os.mkdir(WORKING_DIR)
        
    rag = LightRAG(
        working_dir=WORKING_DIR,  # 数据存储目录
        chunk_token_size=300,     # 文本分块大小
        chunk_overlap_token_size=0,  # 分块重叠比例
        
        # LLM模型配置
        llm_model_func=ollama_model_complete,  # 模型推理函数
        llm_model_name="deepseek-r1:32b",      # 模型名称
        llm_model_max_async=1,                # 最大并发推理数
        llm_model_max_token_size=8192,        # 最大上下文长度
        llm_model_kwargs={                    # 模型参数
            "host": "http://localhost:11434", # OLLAMA服务地址
            "options": {"num_ctx": 8192}      # 上下文长度设置
        },
        
        # 嵌入模型配置
        embedding_func=EmbeddingFunc(
            embedding_dim=1024,               # 嵌入向量维度
            max_token_size=512,               # 嵌入最大token数
            func=lambda texts: ollama_embedding(  # 嵌入函数
                texts, embed_model="bge-m3:latest", 
                host="http://localhost:11434"
            ),
        ),
    )
    
    await rag.initialize_storages()  # 初始化存储
    await initialize_pipeline_status()  # 初始化流水线状态
    
    return rag

关键参数说明：

chunk_token_size：控制文本分块粒度，300 token ≈ 100-150汉字
ollama_model_complete：通过OLLAMA调用DeepSeek-R1:32B模型
bge-m3：文本嵌入模型，用于构建向量数据库

3. 主流程实现

def main():
    # 初始化系统
    rag = asyncio.run(initialize_rag())
    
    # 加载数据
    with open("./xingfa.txt", "r", encoding="utf-8") as f:
        rag.insert(f.readlines()[:3])  # 插入前3段文本
    
    # 执行不同检索模式查询
    for mode in ["naive", "local", "global", "hybrid", "mix"]:
        result = rag.query(
            "What are the top themes in this story?",
            param=QueryParam(mode=mode)
        )
        print(f"Mode {mode}: {result}")

三、核心概念解析

1. RAG模式对比

模式	机制	适用场景
naive	直接调用模型	快速测试，无检索增强
local	向量检索+局部重写	中等复杂度查询
global	全局上下文检索	需要全局语义理解
hybrid	向量+知识图谱混合	结构化与非结构化数据结合
mix	知识图谱+向量检索	复杂关系推理

2. OLLAMA本地模型部署

需要提前通过OLLAMA部署模型：

ollama pull deepseek-r1:32b
ollama pull bge-m3:latest

四、运行环境要求

系统环境：
- Python 3.8+
- SQLite 3.x
- OLLAMA服务（需配置本地模型）
Python依赖：

pip install lightrag ollama

五、扩展建议

数据增强：

# 批量插入数据
with open("large_corpus.txt") as f:
    rag.insert(f.read().split("CHAPTER "))

性能优化：

# 启用缓存加速检索
rag.enable_cache(storage="redis://localhost:6379")

六、常见问题解答

Q1：如何更换嵌入模型？
修改EmbeddingFunc配置：

func=lambda texts: ollama_embedding(
    texts, embed_model="sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"
)

Q2：如何调整响应长度？
在QueryParam中指定：

param=QueryParam(max_tokens=4096)

import asyncio
import os
from lightrag import LightRAG, QueryParam
from lightrag.kg.shared_storage import initialize_pipeline_status

from lightrag.llm.ollama import ollama_model_complete, ollama_embedding
from lightrag.utils import EmbeddingFunc


async def initialize_rag():
    WORKING_DIR = "./dickens"

    if not os.path.exists(WORKING_DIR):
        os.mkdir(WORKING_DIR)
    rag = LightRAG(
        working_dir=WORKING_DIR,
        chunk_token_size=300,
        chunk_overlap_token_size=0,
        llm_model_func=ollama_model_complete,
        llm_model_name="deepseek-r1:32b",
        llm_model_max_async=1,
        llm_model_max_token_size=8192,
        llm_model_kwargs={"host": "http://localhost:11434", "options": {"num_ctx": 8192}},
        embedding_func=EmbeddingFunc(
            embedding_dim=1024,  # bge-large-zh-v1.5
            max_token_size=512,  # bge-large-zh-v1.5
            func=lambda texts: ollama_embedding(
                texts, embed_model="bge-m3:latest", host="http://localhost:11434"
            ),
        ),
    )

    await rag.initialize_storages()
    await initialize_pipeline_status()

    return rag


def main():
    # Initialize RAG instance
    rag = asyncio.run(initialize_rag())

    with open("./xingfa.txt", "r", encoding="utf-8") as f:
        rag.insert(f.readlines()[:3])
    # 执行不同的检索模式
    # print(rag.query("What are the top themes in this story?", param=QueryParam(mode="naive")))
    # print(rag.query("What are the top themes in this story?", param=QueryParam(mode="local")))
    # print(rag.query("What are the top themes in this story?", param=QueryParam(mode="global")))
    # print(rag.query("What are the top themes in this story?", param=QueryParam(mode="hybrid")))
    #
    # # Insert text
    # rag.insert("Your text")

    # Perform naive search
    mode = "naive"
    # Perform local search
    mode = "local"
    # Perform global search
    mode = "global"
    # Perform hybrid search
    mode = "hybrid"
    # Mix mode Integrates knowledge graph and vector retrieval.
    mode = "mix"

    rag.query(
        "What are the top themes in this story?",
        param=QueryParam(mode=mode)
    )


if __name__ == "__main__":
    main()

AI Agent技术社区

Agent 垂直技术社区，欢迎活跃、内容共建。

更多推荐

后端接入 AI Agent：Tool Calling 网关、幂等与审计日志实战

AI Agent技术社区

AI导出鸭惊了！DeepSeek代码手机导出保姆级实操，不看亏一套海景房

AI Agent技术社区

OpenClaw vs Hermes Agent：企业级执行 vs 自我进化，一文读懂怎么选！

AI Agent 开源双子星深度对比：OpenClaw（GitHub 26.4w⭐）主打工程化落地，四层记忆+20+渠道+13,700+技能，适合企业自动化；Hermes Agent（53天10w⭐）主打闭环学习，四级记忆+自动技能进化+3,200+社区技能，越用越聪明。两者可互补组合：OpenClaw 做稳定执行引擎，Hermes 做持续学习大脑。短期落地选 OpenClaw，长期陪伴选 Her