01 claude code agent loop工程化分析
参考项目:
https://github.com/shareAI-lab/learn-claude-code/tree/main
claude-code-best/claude-code: 原汁原昧 Claude Code 可运行,可构建, 可调试版; 生产级工程化, 企业级可靠性; 安全无毒, 内存泄露修复
前文已经说到agent loop就是如下图所示的一个循环:

一,使用message.content.type 而非stop_reason
但需要注意的是其中的非常经典的防御性编程(Defensive Programming):

对于模型工具调用响应字段字段,通过具体的message.content.type进行判断。而非stop_reason字段:
为什么 stop_reason尤其不可信?
在LLM应用中,stop_reason(或 finish_reason)通常是一个高层抽象。它是由推理框架(如Anthropic的后端)根据模型输出的原始内容流,经过一系列规则判断后给出的一个结论。
这个判断逻辑本身就是一个复杂的软件模块,它本身也会有Bug。例如:
-
竞态条件:模型刚好在生成工具调用参数的中途触发了停止条件(如
max_tokens),框架可能仓促地给出了一个错误的stop_reason。 -
分类错误:框架可能没能在毫秒级的流式传输中,准确区分“正在生成工具调用”和“已完成工具调用”。
为什么会有这种情况?
主要原因有以下三点:
-
网络传输与解析的不确定性:API 返回的是 JSON 字符串。在网络传输中可能发生截断、解析库 Bug 或流式传输(
stream=True)时的状态同步延迟,导致元数据字段(如stop_reason)丢失或错误,但正文内容(Content Blocks)却是完整的。 -
LLM 服务端的实现差异:大模型后端可能由多个微服务组成。负责生成文本的模型可能输出了工具调用,但负责封装最终 API 响应的协调层可能因为超时、逻辑分支遗漏或版本迭代,未能正确设置
stop_reason字段。 -
“看结果而非意图”:在生产环境中,数据(Data)比元数据(Metadata)更可信。如果响应体中明确包含了
<tool_use>的结构化数据,无论stop_reason写的是什么,程序都应该执行工具调用。反之,如果只相信stop_reason="tool_use"但响应体为空,程序就会卡死或报错。

完整代码示例,来源 https://github.com/shareAI-lab/learn-claude-code/tree/main:
#!/usr/bin/env python3
"""
s01_agent_loop.py - The Agent Loop
The entire secret of an AI coding agent in one pattern:
while stop_reason == "tool_use":
response = LLM(messages, tools)
execute tools
append results
+----------+ +-------+ +---------+
| User | ---> | LLM | ---> | Tool |
| prompt | | | | execute |
+----------+ +---+---+ +----+----+
^ |
| tool_result |
+---------------+
(loop continues)
This is the core loop: feed tool results back to the model
until the model decides to stop. Production agents layer
policy, hooks, and lifecycle controls on top.
Usage:
pip install anthropic python-dotenv
ANTHROPIC_API_KEY=... python s01_agent_loop/code.py
"""
import os
import subprocess
try:
import readline
# macOS 的 libedit 在处理中文输入时有退格问题,这四行修复它
readline.parse_and_bind('set bind-tty-special-chars off')
readline.parse_and_bind('set input-meta on')
readline.parse_and_bind('set output-meta on')
readline.parse_and_bind('set convert-meta off')
except ImportError:
pass
from anthropic import Anthropic
from dotenv import load_dotenv
load_dotenv(override=True)
if os.getenv("ANTHROPIC_BASE_URL"):
os.environ.pop("ANTHROPIC_AUTH_TOKEN", None)
client = Anthropic(base_url=os.getenv("ANTHROPIC_BASE_URL"))
MODEL = os.environ["MODEL_ID"]
SYSTEM = f"You are a coding agent at {os.getcwd()}. Use bash to solve tasks. Act, don't explain."
# ── Tool definition: just bash ────────────────────────────
TOOLS = [{
"name": "bash",
"description": "Run a shell command.",
"input_schema": {
"type": "object",
"properties": {"command": {"type": "string"}},
"required": ["command"],
},
}]
# ── Tool execution ────────────────────────────────────────
def run_bash(command: str) -> str:
dangerous = ["rm -rf /", "sudo", "shutdown", "reboot", "> /dev/"]
if any(d in command for d in dangerous):
return "Error: Dangerous command blocked"
try:
r = subprocess.run(command, shell=True, cwd=os.getcwd(),
capture_output=True, text=True, timeout=120)
out = (r.stdout + r.stderr).strip()
return out[:50000] if out else "(no output)"
except subprocess.TimeoutExpired:
return "Error: Timeout (120s)"
except (FileNotFoundError, OSError) as e:
return f"Error: {e}"
# ── The core pattern: a while loop that calls tools until the model stops ──
def agent_loop(messages: list):
while True:
response = client.messages.create(
model=MODEL, system=SYSTEM, messages=messages,
tools=TOOLS, max_tokens=8000,
)
# Append assistant turn
messages.append({"role": "assistant", "content": response.content})
# If the model didn't call a tool, we're done
if response.stop_reason != "tool_use":
return
# Execute each tool call, collect results
results = []
for block in response.content:
if block.type == "tool_use":
print(f"\033[33m$ {block.input['command']}\033[0m")
output = run_bash(block.input["command"])
print(output[:200])
results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": output,
})
# Feed tool results back, loop continues
messages.append({"role": "user", "content": results})
# ── Entry point ──────────────────────────────────────────
if __name__ == "__main__":
print("s01: Agent Loop")
print("输入问题,回车发送。输入 q 退出。\n")
history = []
while True:
try:
query = input("\033[36ms01 >> \033[0m")
except (EOFError, KeyboardInterrupt):
break
if query.strip().lower() in ("q", "exit", ""):
break
history.append({"role": "user", "content": query})
agent_loop(history)
# Print the model's final text response
response_content = history[-1]["content"]
if isinstance(response_content, list):
for block in response_content:
if getattr(block, "type", None) == "text":
print(block.text)
print()
更多推荐

所有评论(0)