参考推荐:

  1. 一文读懂LLM API应用开发基础(万字长文)-CSDN博客
  2. 使用 LLaMA-Factory 实现对大模型函数调用功能_glaive-function-calling-CSDN博客
  3. Function Calling - Qwen

下载大模型

如果使用在线的,则去申请对应模型的api_key即可,有些需要收费。

下载需要的模型,这里以qwen2-7B为例

qwen2-7B下载网址:https://huggingface.co/Qwen/Qwen2-7B/tree/main

跟着这里的指令按照即可

模型部署

普通调用的方式有很多种框架,对于qwen有vllm和transformer都可以调用,因为transformer这里调用失败,所以主要实验vllm api的方式

api调用的,需要安装vllm环境

pip install vllm

vllm部署指令

CUDA_VISIBLE_DEVICES=2,3 vllm serve /模型地址 --served-model-name 模型名字  --port 8001
参数解释:
用的哪些显卡              vllm serve 本地模型地址  --served-model-name 部署模型名字 --port 端口
eg:
CUDA_VISIBLE_DEVICES=1 vllm serve /home/yiya_luo/models/Qwen2 --served-model-name qwen2-test  --port 8000

如果使用的是llama-factory框架微调训练的,可以用这条指令部署

CUDA_VISIBLE_DEVICES=2,3 API_PORT=8008 llamafactory-cli api --model_name_or_path 本地模型地址 --template qwen2 --infer_backend vllm --vllm_enforce_eager

测试,单个问题curl:

curl <http://localhost:8008/v1/chat/completions> -H "Content-Type: application/json" -d '{
  "model": "qwen2-test",
  "messages": [
    {"role": "system", "content": "你是一个有用的小助手"},
    {"role": "user", "content": "新中国成立是哪一年?"}
  ],
  "temperature": 0.7,
  "top_p": 0.8,
  "repetition_penalty": 1.05,
  "max_tokens": 512
}'

工具调用

调用前需要定义函数,并写好详细的工具描述,并进行微调,如果是模型给定好的接口函数不需要微调,这里操作的是需要自定义的。

微调

数据构造

微调前要构造训练数据,这里的数据推荐使用sharegpt的格式,因为使用的是llama-factory的框架训练,所以训练常用填充格式如下,参考https://github.com/hiyouga/LLaMA-Factory/blob/main/data/README.md

环境安装直接pip install -r requirement.txt就好,最好是新建一个空的环境安装,有些包容易冲突一个个装

#格式,如果多轮对话的话,
[
  {
    "conversations": [
      {
        "from": "human",
        "value": "人类指令"
      },
      {
        "from": "function_call",
        "value": "工具参数"
      },
      {
        "from": "observation",
        "value": "工具结果"
      },
      {
        "from": "gpt",
        "value": "模型回答"
      }
    ],
    "system": "系统提示词(选填)",
    "tools": "工具描述(选填)"
  }
]
例子
{
    "conversations": [
      {
        "from": "human",
        "value": "我需要为John Doe生成一张发票。他购买了2个苹果,每个$1,以及3根香蕉,每根$0.5。"
      },
      {
        "from": "function_call",
        "value": "{\\"name\\": \\"generate_invoice\\", \\"arguments\\": {\\"customer_name\\": \\"约翰·多伊\\", \\"items\\": [{\\"name\\": \\"苹果\\", \\"quantity\\": 2, \\"price\\": 1}, {\\"name\\": \\"香蕉\\", \\"quantity\\": 3, \\"price\\": 0.5}]}}"
      },
      {
        "from": "observation",
        "value": "{\\"invoice_id\\": \\"INV12345\\", \\"customer_name\\": \\"约翰·多伊\\", \\"items\\": [{\\"name\\": \\"苹果\\", \\"quantity\\": 2, \\"price\\": 1, \\"total\\": 2}, {\\"name\\": \\"香蕉\\", \\"quantity\\": 3, \\"price\\": 0.5, \\"total\\": 1.5}], \\"total\\": 3.5, \\"status\\": \\"生成\\"}"
      },
      {
        "from": "gpt",
        "value": "发票已成功生成。发票编号为INV12345。约翰·多伊的总金额为$3.5。发票包含2个苹果,总金额为$2,以及3根香蕉,总金额为$1.5。"
      }
    ],
    "tools": "[{\\"name\\": \\"generate_invoice\\", \\"description\\": \\"生成发票\\", \\"parameters\\": {\\"type\\": \\"object\\", \\"properties\\": {\\"customer_name\\": {\\"type\\": \\"string\\", \\"description\\": \\"客户名称\\"}, \\"items\\": {\\"type\\": \\"array\\", \\"items\\": {\\"type\\": \\"object\\", \\"properties\\": {\\"name\\": {\\"type\\": \\"string\\", \\"description\\": \\"The item name\\"}, \\"quantity\\": {\\"type\\": \\"integer\\", \\"description\\": \\"The quantity of the item\\"}, \\"price\\": {\\"type\\": \\"number\\", \\"description\\": \\"The price per unit\\"}}, \\"required\\": [\\"name\\", \\"quantity\\", \\"price\\"]}}}, \\"required\\": [\\"customer_name\\", \\"items\\"]}}, {\\"name\\": \\"generate_password\\", \\"description\\": \\"生成随机密码\\", \\"parameters\\": {\\"type\\": \\"object\\", \\"properties\\": {\\"length\\": {\\"type\\": \\"integer\\", \\"description\\": \\"密码的长度\\"}}, \\"required\\": [\\"length\\"]}}]"
  },

数据构造好后,需要把数据描述添加的数据集文件中,

"数据集名称": {
  "file_name": "data.json",
  "formatting": "sharegpt",
  "columns": {
    "messages": "conversations",
    "system": "system",
    "tools": "tools"
  }
}

例如:

微调

数据格式保持一致的话就可以开始微调了

单卡微调指令参考:使用 LLaMA-Factory 实现对大模型函数调用功能_glaive-function-calling-CSDN博客

python src/train_bash.py     \\
--stage sft     \\
--do_train True     \\
--model_name_or_path /models/Qwen1.5-4B     \\
--finetuning_type lora     \\
--template qwen     \\
--dataset_dir data     \\
--dataset glaive_toolcall,alpaca_gpt4_en,alpaca_gpt4_zh,oaast_sft_zh     \\
--cutoff_len 1024     \\
--learning_rate 5e-05     \\
--num_train_epochs 2.0     \\
--max_samples 50000     \\
--per_device_train_batch_size 2     \\
--gradient_accumulation_steps 4     \\
--lr_scheduler_type cosine     \\
--max_grad_norm 1.0     \\
--logging_steps 100     \\
--save_steps 1000     \\
--warmup_steps 0     \\
--optim adamw_torch     \\
--report_to none     \\
--output_dir saves/Qwen1.5-4B/lora/train_2024-04-20-15-30-29     \\
--fp16 True     \\
--lora_rank 8     \\
--lora_alpha 16     \\
--lora_dropout 0.1     \\
--lora_target all     \\
--plot_loss True

多卡就是把python 改成:torchrun --nproc_per_node=8

推理

微调后的模型进行部署,部署后进行定义函数进行调用,调用方式有:

参考:Function Calling - Qwen

open ai框架,参考https://github.com/hiyouga/LLaMA-Factory/blob/main/scripts/test_toolcall.py

import json
import os
from typing import Sequence

from openai import OpenAI
from transformers.utils.versions import require_version

require_version("openai>=1.5.0", "To fix: pip install openai>=1.5.0")

def calculate_gpa(grades: Sequence[str], hours: Sequence[int]) -> float:
    grade_to_score = {"A": 4, "B": 3, "C": 2}
    total_score, total_hour = 0, 0
    for grade, hour in zip(grades, hours):
        total_score += grade_to_score[grade] * hour
        total_hour += hour
    return round(total_score / total_hour, 2)

def main():
    client = OpenAI(
        api_key="{}".format(os.environ.get("API_KEY", "0")),
        base_url="<http://localhost>:{}/v1".format(os.environ.get("API_PORT", 8000)),
    )
    tools = [
        {
            "type": "function",
            "function": {
                "name": "calculate_gpa",
                "description": "Calculate the Grade Point Average (GPA) based on grades and credit hours",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "grades": {"type": "array", "items": {"type": "string"}, "description": "The grades"},
                        "hours": {"type": "array", "items": {"type": "integer"}, "description": "The credit hours"},
                    },
                    "required": ["grades", "hours"],
                },
            },
        }
    ]
    tool_map = {"calculate_gpa": calculate_gpa}

    messages = []
    messages.append({"role": "user", "content": "My grades are A, A, B, and C. The credit hours are 3, 4, 3, and 2."})
    result = client.chat.completions.create(messages=messages, model="test", tools=tools)
    if result.choices[0].message.tool_calls is None:
        raise ValueError("Cannot retrieve function call from the response.")

    messages.append(result.choices[0].message)
    tool_call = result.choices[0].message.tool_calls[0].function
    print(tool_call)
    # Function(arguments='{"grades": ["A", "A", "B", "C"], "hours": [3, 4, 3, 2]}', name='calculate_gpa')
    name, arguments = tool_call.name, json.loads(tool_call.arguments)
    tool_result = tool_map[name](**arguments)
    messages.append({"role": "tool", "content": json.dumps({"gpa": tool_result}, ensure_ascii=False)})
    result = client.chat.completions.create(messages=messages, model="test", tools=tools)
    print(result.choices[0].message.content)
    # Based on the grades and credit hours you provided, your Grade Point Average (GPA) is 3.42.

if __name__ == "__main__":
    main()

transformer框架(未成功)【transformer框架不需要提前部署】

qwen-agent:Qwen-Agent - Qwen

Logo

Agent 垂直技术社区,欢迎活跃、内容共建。

更多推荐