ollama部署embeddinggemma-300m：开源嵌入模型+Ollama+FastAPI组合方案

本文介绍了如何在星图GPU平台上自动化部署【ollama】embeddinggemma-300m镜像，构建开源嵌入模型与FastAPI的组合方案。该方案支持快速实现文本向量化功能，适用于语义搜索、推荐系统等场景，显著提升自然语言处理效率。

Jay星晴

91人浏览 · 2026-01-28 01:59:01

Jay星晴 · 2026-01-28 01:59:01 发布

ollama部署embeddinggemma-300m：开源嵌入模型+Ollama+FastAPI组合方案

1. 引言

在当今信息爆炸的时代，如何高效处理和理解海量文本数据成为技术挑战。EmbeddingGemma-300m作为谷歌最新推出的开源嵌入模型，以其小巧的体积和强大的性能，为开发者提供了理想的解决方案。本文将手把手教你如何通过Ollama平台部署EmbeddingGemma-300m模型，并构建基于FastAPI的嵌入服务。

通过本教程，你将学会：

快速部署EmbeddingGemma-300m模型
搭建简单的FastAPI服务
实现文本向量化功能
进行语义相似度计算

2. 环境准备与模型部署

2.1 安装Ollama

首先确保你的系统已经安装了Docker，然后执行以下命令安装Ollama：

curl -fsSL https://ollama.com/install.sh | sh

安装完成后，启动Ollama服务：

ollama serve

2.2 下载EmbeddingGemma-300m模型

使用Ollama拉取EmbeddingGemma-300m模型：

ollama pull embeddinggemma:300m

这个命令会自动下载约300MB的模型文件，下载速度取决于你的网络状况。

2.3 验证模型安装

运行以下命令测试模型是否正常工作：

ollama run embeddinggemma:300m "Hello world"

如果看到返回的向量表示，说明模型已成功加载。

3. 构建FastAPI服务

3.1 创建Python虚拟环境

python -m venv venv
source venv/bin/activate  # Linux/Mac
# 或 venv\Scripts\activate  # Windows

3.2 安装必要依赖

pip install fastapi uvicorn requests python-dotenv

3.3 创建FastAPI应用

新建一个main.py文件，添加以下代码：

from fastapi import FastAPI
import subprocess
import json

app = FastAPI()

@app.post("/embed")
async def get_embedding(text: str):
    try:
        result = subprocess.run(
            ["ollama", "run", "embeddinggemma:300m", text],
            capture_output=True, text=True
        )
        embedding = json.loads(result.stdout)
        return {"embedding": embedding}
    except Exception as e:
        return {"error": str(e)}

3.4 启动服务

uvicorn main:app --reload

现在你的嵌入服务已经在http://127.0.0.1:8000运行了。

4. 使用示例

4.1 获取文本嵌入

使用curl测试API：

curl -X POST "http://127.0.0.1:8000/embed" -H "Content-Type: application/json" -d '{"text":"自然语言处理技术"}'

你会得到类似这样的响应：

{
  "embedding": [0.123, -0.456, 0.789, ...]
}

4.2 计算语义相似度

添加一个新的API端点来计算两个文本的相似度：

from numpy import dot
from numpy.linalg import norm

@app.post("/similarity")
async def calculate_similarity(text1: str, text2: str):
    emb1 = await get_embedding(text1)
    emb2 = await get_embedding(text2)
    
    if "error" in emb1 or "error" in emb2:
        return {"error": emb1.get("error", emb2.get("error"))}
    
    # 计算余弦相似度
    similarity = dot(emb1["embedding"], emb2["embedding"])/(norm(emb1["embedding"])*norm(emb2["embedding"]))
    return {"similarity": float(similarity)}

测试相似度计算：

curl -X POST "http://127.0.0.1:8000/similarity" -H "Content-Type: application/json" -d '{"text1":"人工智能", "text2":"机器学习"}'

5. 性能优化与扩展

5.1 批量处理支持

为了提高效率，我们可以修改API以支持批量文本处理：

@app.post("/batch_embed")
async def batch_embed(texts: list[str]):
    embeddings = []
    for text in texts:
        result = await get_embedding(text)
        if "error" in result:
            return result
        embeddings.append(result["embedding"])
    return {"embeddings": embeddings}

5.2 使用模型缓存

为了避免每次请求都重新加载模型，我们可以使用Ollama的持久化会话功能：

import subprocess

class ModelClient:
    def __init__(self):
        self.process = subprocess.Popen(
            ["ollama", "run", "embeddinggemma:300m"],
            stdin=subprocess.PIPE,
            stdout=subprocess.PIPE,
            text=True
        )
    
    def embed(self, text):
        self.process.stdin.write(text + "\n")
        self.process.stdin.flush()
        return json.loads(self.process.stdout.readline())

model_client = ModelClient()

@app.post("/embed")
async def get_embedding(text: str):
    try:
        embedding = model_client.embed(text)
        return {"embedding": embedding}
    except Exception as e:
        return {"error": str(e)}

6. 总结

通过本教程，我们成功实现了：

使用Ollama部署EmbeddingGemma-300m模型
构建了基于FastAPI的文本嵌入服务
实现了单文本和批量文本的向量化功能
添加了语义相似度计算能力
优化了服务性能

这套方案特别适合需要轻量级嵌入模型的应用场景，如：

本地文档搜索系统
个性化推荐引擎
语义缓存系统
小规模分类和聚类任务

获取更多AI镜像

想探索更多AI镜像和应用场景？访问 CSDN星图镜像广场，提供丰富的预置镜像，覆盖大模型推理、图像生成、视频生成、模型微调等多个领域，支持一键部署。

AI Agent技术社区

Agent 垂直技术社区，欢迎活跃、内容共建。

更多推荐

后端接入 AI Agent：Tool Calling 网关、幂等与审计日志实战

AI Agent技术社区

AI导出鸭惊了！DeepSeek代码手机导出保姆级实操，不看亏一套海景房

AI Agent技术社区

OpenClaw vs Hermes Agent：企业级执行 vs 自我进化，一文读懂怎么选！

AI Agent 开源双子星深度对比：OpenClaw（GitHub 26.4w⭐）主打工程化落地，四层记忆+20+渠道+13,700+技能，适合企业自动化；Hermes Agent（53天10w⭐）主打闭环学习，四级记忆+自动技能进化+3,200+社区技能，越用越聪明。两者可互补组合：OpenClaw 做稳定执行引擎，Hermes 做持续学习大脑。短期落地选 OpenClaw，长期陪伴选 Her