CANN/DeepSeek-V4配置指南

汤璞亚Heath

162人浏览 · 2026-05-09 11:40:43

汤璞亚Heath · 2026-05-09 11:40:43 发布

YAML Parameter Description

【免费下载链接】cann-recipes-infer 本项目针对LLM与多模态模型推理业务中的典型模型、加速算法，提供基于CANN平台的优化样例项目地址: https://gitcode.com/cann/cann-recipes-infer

The configuration instructions in the YAML file can be found below.

Basic Config
  model_name: "deepseek_v4"                           # Model name. String type
  model_path: "/data/models/deepseek_v4_int8_w8a8"    # Weights path. String type
  exe_mode: "npugraph_ex"                               # Execution mode. Only support ["eager", "npugraph_ex"]
  world_size: 128                                       # Global rank num. Int type

Model Config
  pa_block_size: 128              # PA Block Size value. Support [128]
  with_ckpt: True                 # Whether load ckpt. Support [False, True]
  enable_multi_streams: True      # Whether enable multistream to improve performance. Support [False, True]
  enable_profiler: True           # Whether enable profiling. Support [False, True]
  enable_cache_compile: False     # Whether enable cache compile for better successive performance. Support [False, True]
  prefill_mini_batch_size: 0      # Mini_batch_size for prefill stage. Support [0, 1, 2, 3]
  perfect_eplb: False             # If enabled, will force uniform selection of MoE experts. Support [False, True]
  enable_online_split_weight: True  # Whether enable online-split weight. Support [False, True]
  next_n: 1                       # Steps using multi-token prediction. Support [0, 1, 2, 3]
  platform_version: "A3"          # inference platform. Support ["A3", "950"]
  enable_pypto: False             # Whether enable pypto operators. Support ["True", "False"]

Data Config
  dataset: "default"  # Support ["default" "InfiniteBench" "LongBench"]
  input_max_len: 8192 # Max input prompt length
  max_new_tokens: 256 # Max inferred new tokens
  batch_size: 128     # Global batch size
  temperature: 1.0    # Float that controls the randomness of the sampling. Lower values make the model more deterministic,
                      # while higher values make the model more random. Zero means greedy sampling.

Parallel Config
  cp_size: 1          # Prefill CP Number. Only support [1, world_size]
  attn_tp_size: 1     # Attention TP Number. Only support [1]
  oproj_tp_size: 1    # Oproj TP Number. Only support [1, 4, 8]
  moe_tp_size: 1      # MoE TP Number. Only support [1]
  embed_tp_size: 16   # Embed TP Number. Only support [1, 4, 8 16]
  lmhead_tp_size: 16  # LMHead TP Number. Only support [1, 4, 8 16]

AI Agent技术社区

Agent 垂直技术社区，欢迎活跃、内容共建。

更多推荐

每日AI新闻推送 | 2026年6月12日

AI Agent技术社区

云客服是什么？2026 年 6 月最新核心技术解析与入门指南

AI Agent技术社区

大模型 API 聚合服务从工具走向基础设施：星链4SAPI的企业价值

它涵盖 GPT、Claude、Gemini 等主流模型，接入方式与 OpenAI 官方接口兼容，同时支持多模态数据处理、线路优化、人民币结算、企业级账务管理、国内备案主体等条件。迁移成本同样不可忽视。尤其是金融、教育、医疗、政企服务、ToB SaaS 等行业，供应商资质、备案状态、数据流向、费用凭证及合同主体都会被反复核查。从这个角度看，星链4SAPI 值得被重点评估，是因为它把国内企业真正关心的