在mac m1基于ollama运行deepseek r1
deepseek-r1:8b 比较适合笔记本部署,大约5G左右,当然也可以根据电脑配置选择其他版本。ollama将deepseek-r1:8b拉到本地后,就可以开始交互测试了。相比llama.cpp,ollama因为集成各种工具,使用更简单。在mac m1基于llama.cpp运行deepseek。在ollama的官网下载mac m1版本的ollama。2运行deepseek r1。最终获得如下所
1 下载和安装
在ollama的官网下载mac m1版本的ollama
最终获得如下所示的下载地址
https://github.com/ollama/ollama/releases/latest/download/Ollama.dmg
然后点击安装,然后测试
ollama list
2 运行deepseek r1
deepseek-r1:8b 比较适合笔记本部署,大约5G左右,当然也可以根据电脑配置选择其他版本。
ollama run deepseek-r1:8b
ollama将deepseek-r1:8b拉到本地后,就可以开始交互测试了。
3 api调用ollama model
1)初始设置
本地ollama端口11434
import requests
# 基础初始化设置
base_url = "http://localhost:11434/api"
headers = {
"Content-Type": "application/json"
}
调用llm示例代码,可以按需指定llm,比如"deepseek-r1:8b"
2)流式生成文本补全
(Streaming Completion)
import requests
import json
# 基础初始化设置
base_url = "http://localhost:11434/api"
headers = {
"Content-Type": "application/json"
}
def generate_completion_stream(prompt, model="deepseek-r1:8b"):
url = f"{base_url}/generate"
data = {
"model": model,
"prompt": prompt,
"stream": True,
"options": {
"temperature": 0.7
}
}
response = requests.post(url, headers=headers, json=data, stream=True)
result = ""
for line in response.iter_lines():
if line:
d = json.loads(line)
resp = d["response"]
result += str(resp)
return result
# 示例调用
prompt = "什么是自然语言处理?"
stream_completion = generate_completion_stream(prompt)
print("流式生成文本补全:", stream_completion)
3)生成对话补全
(Chat Completion)
def generate_chat_completion(messages, model="qwen2.5:0.5b"):
url = f"{base_url}/chat"
data = {
"model": model,
"messages": messages,
"stream": False,
"options": {
"temperature": 0.7
},
}
response = requests.post(url, headers=headers, json=data)
return response.json().get('message', {}).get('content', '')
# 示例调用
messages = [
{"role": "user", "content": "什么是自然语言处理?"},
{"role": "assistant", "content": "自然语言处理是人工智能的一个领域,专注于人与计算机之间的自然语言交互。"}
]
chat_response = generate_chat_completion(messages)
print("生成对话补全:", chat_response)
4)生成文本嵌入
(Generate Embeddings)
def generate_embeddings(text, model="qwen2.5:0.5b"):
url = f"{base_url}/embed"
data = {
"model": model,
"input": text
}
response = requests.post(url, headers=headers, json=data)
return response.json()
# 示例调用
embeddings = generate_embeddings("什么是深度学习?")
print("生成文本嵌入:", embeddings)
5)列出本地模型
(List Local Models)
def list_local_models():
url = f"{base_url}/tags"
response = requests.get(url, headers=headers)
return response.json().get('models', [])
# 示例调用
local_models = list_local_models()
print("列出本地模型:", local_models)
6)查看模型信息
(Show Model Information)
def show_model_info(model="qwen2.5:0.5b"):
url = f"{base_url}/show"
data = {"name": model}
response = requests.post(url, headers=headers, json=data)
return response.json()
# 示例调用
model_info = show_model_info()
print("模型信息:", model_info)
7)创建模型
(Create a Model)
def create_model(model_name="qwen2.5_custom", base_model="qwen2.5:0.5b"):
url = f"{base_url}/create"
data = {
"name": model_name,
"modelfile": f"FROM {base_model}\nSYSTEM You are a helpful assistant."
}
response = requests.post(url, headers=headers, json=data)
return response.json()
# 示例调用
create_response = create_model()
print("创建模型:", create_response)
8)拉取模型
def pull_model(model_name="qwen2.5:0.5b"):
url = f"{base_url}/api/pull"
data = {"name": model_name}
response = requests.post(url, headers=headers, json=data)
return response.json()
# 示例调用
pull_response = pull_model()
print("拉取模型:", pull_response)
9)删除模型
def delete_model(model_name="qwen2.5_custom"):
url = f"{base_url}/delete"
data = {"name": model_name}
response = requests.delete(url, headers=headers, json=data)
return response.json()
# 示例调用
delete_response = delete_model()
print("删除模型:", delete_response)
相比llama.cpp,ollama集成各种工具,使用更简单。
使用openai接口访问ollama模型的例子见
如何用OpenAI SDK调用Ollama LLM_openai 连接ollama-CSDN博客
reference
----
ollama
https://github.com/ollama/ollama
在mac m1基于llama.cpp运行deepseek
https://blog.csdn.net/liliang199/article/details/149246699
如何用OpenAI SDK调用Ollama LLM
更多推荐


所有评论(0)