普通视图

发现新文章,点击刷新页面。
昨天以前Measure Zero

读 Claude Code 源码 - memory 机制续篇

作者 Shiina
2026年4月13日 08:00

书接上回, 增补 Auto Memory 保存和召回细节; 以及介绍 auto-dream.

Auto Memory 怎么保存

两条路线

  • 主 agent 通过通用 write/edit 工具写入指定 memory 目录
  • 如果主 agent 这一轮没写, 结束后再用一个后台 forked subagent 补抽一次
/**
 * Whether the extract-memories background agent will run this session.
 *
 * The main agent's prompt always has full save instructions regardless of
 * this gate — when the main agent writes memories, the background agent
 * skips that range (hasMemoryWritesSince in extractMemories.ts); when it
 * doesn't, the background agent catches anything missed.
 */
/**
 * The extraction agent runs as a perfect fork of the main conversation — same
 * system prompt, same message prefix. The main agent's system prompt always
 * has full save instructions; when the main agent writes memories itself,
 * extractMemories.ts skips that turn (hasMemoryWritesSince). This prompt
 * fires only when the main agent didn't write, so the save-criteria here
 * overlap the system prompt's harmlessly.
 */

主 agent 写入 memory

You have a persistent, file-based memory system at `{memoryDir}`. This directory already exists  write to it directly with the Write tool (do not run mkdir or check for its existence).

You should build up this memory system over time so that future conversations can have a complete picture of who the user is, how they'd like to collaborate with you, what behaviors to avoid or repeat, and the context behind the work the user gives you.

If the user explicitly asks you to remember something, save it immediately as whichever type fits best. If they ask you to forget something, find and remove the relevant entry.

{TYPES_SECTION_INDIVIDUAL}
{WHAT_NOT_TO_SAVE_SECTION}

后台 agent 写入 memory

后台 agent 限制读写目录权限以及限制轮数.

You are now acting as the memory extraction subagent. Analyze the most recent ~{newMessageCount} messages above and use them to update your persistent memory systems.

Available tools: Read, Grep, Glob, read-only Bash, and Edit/Write for paths inside the memory directory only.

You have a limited turn budget. Edit requires a prior Read of the same file, so the efficient strategy is: turn 1 — issue all Read calls in parallel for every file you might update; turn 2 — issue all Write/Edit calls in parallel. Do not interleave reads and writes across multiple turns.

You MUST only use content from the last ~{newMessageCount} messages to update your persistent memories. Do not waste any turns attempting to investigate or verify that content further — no grepping source files, no reading code to confirm a pattern exists, no git commands.
## Existing memory files

{existingMemories}

Check this list before writing — update an existing file rather than creating a duplicate.
/**
 * Creates a canUseTool function that allows Read/Grep/Glob (unrestricted),
 * read-only Bash commands, and Edit/Write only for paths within the
 * auto-memory directory. Shared by extractMemories and autoDream.
 */
// Well-behaved extractions complete in 2-4 turns (read → write).
// A hard cap prevents verification rabbit-holes from burning turns.
maxTurns: 5,

当前 query 的 tool loop 结束后通过 hook 触发.

/**
 * Extracts durable memories from the current session transcript
 * and writes them to the auto-memory directory (~/.claude/projects/<path>/memory/).
 *
 * It runs once at the end of each complete query loop (when the model produces
 * a final response with no tool calls) via handleStopHooks in stopHooks.ts.
 *
 * Uses the forked agent pattern (runForkedAgent) — a perfect fork of the main
 * conversation that shares the parent's prompt cache.
 */

默认每轮都尝试

// Only run extraction every N eligible turns (tengu_bramble_lintel, default 1).
// Trailing extractions (from stashed contexts) skip this check since they
// process already-committed work that should not be throttled.
/** When a call arrives during an in-progress run, we stash the context here
 *  and run one trailing extraction after the current one finishes. */
// If a call arrived while we were running, run a trailing extraction
// with the latest stashed context.

如果主 agent 这一段已经自己写过 memory, 后台抽取会直接跳过:

/**
 * Returns true if any assistant message after the cursor UUID contains a
 * Write/Edit tool_use block targeting an auto-memory path.
 *
 * The main agent's prompt has full save instructions — when it writes
 * memories, the forked extraction is redundant.
 */

保存哪些, 不保存哪些

保存以下四类

  • user: 用户是谁, 擅长什么, 偏好什么解释方式
  • feedback: 用户对你工作方式的纠正或确认
  • project: 代码库里推不出来的项目背景, 比如 deadline、freeze、事故原因
  • reference: 外部系统入口, 比如 Linear 项目、Grafana 看板、Slack 频道
/**
 * `## Types of memory` section for INDIVIDUAL-ONLY mode (single directory).
 * No <scope> tags. Examples use plain `[saves X memory: …]`. Prose that
 * only makes sense with a private/team split is reworded.
 */
export const TYPES_SECTION_INDIVIDUAL: readonly string[] = [
  '## Types of memory',
  '',
  'There are several discrete types of memory that you can store in your memory system:',
  '',
  '<types>',
  '<type>',
  '    <name>user</name>',
  "    <description>Contain information about the user's role, goals, responsibilities, and knowledge. Great user memories help you tailor your future behavior to the user's preferences and perspective. Your goal in reading and writing these memories is to build up an understanding of who the user is and how you can be most helpful to them specifically. For example, you should collaborate with a senior software engineer differently than a student who is coding for the very first time. Keep in mind, that the aim here is to be helpful to the user. Avoid writing memories about the user that could be viewed as a negative judgement or that are not relevant to the work you're trying to accomplish together.</description>",
  "    <when_to_save>When you learn any details about the user's role, preferences, responsibilities, or knowledge</when_to_save>",
  "    <how_to_use>When your work should be informed by the user's profile or perspective. For example, if the user is asking you to explain a part of the code, you should answer that question in a way that is tailored to the specific details that they will find most valuable or that helps them build their mental model in relation to domain knowledge they already have.</how_to_use>",
  '    <examples>',
  "    user: I'm a data scientist investigating what logging we have in place",
  '    assistant: [saves user memory: user is a data scientist, currently focused on observability/logging]',
  '',
  "    user: I've been writing Go for ten years but this is my first time touching the React side of this repo",
  "    assistant: [saves user memory: deep Go expertise, new to React and this project's frontend — frame frontend explanations in terms of backend analogues]",
  '    </examples>',
  '</type>',
  '<type>',
  '    <name>feedback</name>',
  '    <description>Guidance the user has given you about how to approach work — both what to avoid and what to keep doing. These are a very important type of memory to read and write as they allow you to remain coherent and responsive to the way you should approach work in the project. Record from failure AND success: if you only save corrections, you will avoid past mistakes but drift away from approaches the user has already validated, and may grow overly cautious.</description>',
  '    <when_to_save>Any time the user corrects your approach ("no not that", "don\'t", "stop doing X") OR confirms a non-obvious approach worked ("yes exactly", "perfect, keep doing that", accepting an unusual choice without pushback). Corrections are easy to notice; confirmations are quieter — watch for them. In both cases, save what is applicable to future conversations, especially if surprising or not obvious from the code. Include *why* so you can judge edge cases later.</when_to_save>',
  '    <how_to_use>Let these memories guide your behavior so that the user does not need to offer the same guidance twice.</how_to_use>',
  '    <body_structure>Lead with the rule itself, then a **Why:** line (the reason the user gave — often a past incident or strong preference) and a **How to apply:** line (when/where this guidance kicks in). Knowing *why* lets you judge edge cases instead of blindly following the rule.</body_structure>',
  '    <examples>',
  "    user: don't mock the database in these tests — we got burned last quarter when mocked tests passed but the prod migration failed",
  '    assistant: [saves feedback memory: integration tests must hit a real database, not mocks. Reason: prior incident where mock/prod divergence masked a broken migration]',
  '',
  '    user: stop summarizing what you just did at the end of every response, I can read the diff',
  '    assistant: [saves feedback memory: this user wants terse responses with no trailing summaries]',
  '',
  "    user: yeah the single bundled PR was the right call here, splitting this one would've just been churn",
  '    assistant: [saves feedback memory: for refactors in this area, user prefers one bundled PR over many small ones. Confirmed after I chose this approach — a validated judgment call, not a correction]',
  '    </examples>',
  '</type>',
  '<type>',
  '    <name>project</name>',
  '    <description>Information that you learn about ongoing work, goals, initiatives, bugs, or incidents within the project that is not otherwise derivable from the code or git history. Project memories help you understand the broader context and motivation behind the work the user is doing within this working directory.</description>',
  '    <when_to_save>When you learn who is doing what, why, or by when. These states change relatively quickly so try to keep your understanding of this up to date. Always convert relative dates in user messages to absolute dates when saving (e.g., "Thursday" → "2026-03-05"), so the memory remains interpretable after time passes.</when_to_save>',
  "    <how_to_use>Use these memories to more fully understand the details and nuance behind the user's request and make better informed suggestions.</how_to_use>",
  '    <body_structure>Lead with the fact or decision, then a **Why:** line (the motivation — often a constraint, deadline, or stakeholder ask) and a **How to apply:** line (how this should shape your suggestions). Project memories decay fast, so the why helps future-you judge whether the memory is still load-bearing.</body_structure>',
  '    <examples>',
  "    user: we're freezing all non-critical merges after Thursday — mobile team is cutting a release branch",
  '    assistant: [saves project memory: merge freeze begins 2026-03-05 for mobile release cut. Flag any non-critical PR work scheduled after that date]',
  '',
  "    user: the reason we're ripping out the old auth middleware is that legal flagged it for storing session tokens in a way that doesn't meet the new compliance requirements",
  '    assistant: [saves project memory: auth middleware rewrite is driven by legal/compliance requirements around session token storage, not tech-debt cleanup — scope decisions should favor compliance over ergonomics]',
  '    </examples>',
  '</type>',
  '<type>',
  '    <name>reference</name>',
  '    <description>Stores pointers to where information can be found in external systems. These memories allow you to remember where to look to find up-to-date information outside of the project directory.</description>',
  '    <when_to_save>When you learn about resources in external systems and their purpose. For example, that bugs are tracked in a specific project in Linear or that feedback can be found in a specific Slack channel.</when_to_save>',
  '    <how_to_use>When the user references an external system or information that may be in an external system.</how_to_use>',
  '    <examples>',
  '    user: check the Linear project "INGEST" if you want context on these tickets, that\'s where we track all pipeline bugs',
  '    assistant: [saves reference memory: pipeline bugs are tracked in Linear project "INGEST"]',
  '',
  "    user: the Grafana board at grafana.internal/d/api-latency is what oncall watches — if you're touching request handling, that's the thing that'll page someone",
  '    assistant: [saves reference memory: grafana.internal/d/api-latency is the oncall latency dashboard — check it when editing request-path code]',
  '    </examples>',
  '</type>',
  '</types>',
  '',
]

不保存能从当前 project state 推出来或者容易过期的东西

  • 架构/目录结构
  • git 历史
  • 调 bug 配方
  • 当前会话里的临时状态
export const WHAT_NOT_TO_SAVE_SECTION: readonly string[] = [
  '- Code patterns, conventions, architecture, file paths, or project structure — these can be derived by reading the current project state.',
  '- Git history, recent changes, or who-changed-what — `git log` / `git blame` are authoritative.',
  '- Debugging solutions or fix recipes — the fix is in the code; the commit message has the context.',
  '- Anything already documented in CLAUDE.md files.',
  '- Ephemeral task details: in-progress work, temporary state, current conversation context.',
]

在 prompt 专门强调

These exclusions apply even when the user explicitly asks you to save.

怎么落盘

MEMORY.md 是索引.

## How to save memories

Saving a memory is a two-step process:

**Step 1**  write the memory to its own file (e.g., `user_role.md`, `feedback_testing.md`) using this frontmatter format:

```markdown
---
name: 
description: 
type: 
---



**Step 2** — add a pointer to that file in `MEMORY.md`. `MEMORY.md` is an index, not a memory — each entry should be one line, under ~150 characters: `- [Title](file.md)  one-line hook`. It has no frontmatter. Never write memory content directly into `MEMORY.md`.

- `MEMORY.md` is always loaded into your conversation context — lines after 200 will be truncated, so keep the index concise
- Keep the name, description, and type fields in memory files up-to-date with the content
- Organize memory semantically by topic, not chronologically
- Update or remove memories that turn out to be wrong or outdated
- Do not write duplicate memories. First check if there is an existing memory you can update before writing a new one.

Auto Memory 怎么召回

按 feature gate 分流:

  • 一条偏老路径: 把 MEMORY.md 内容直接注入上下文
    • 放在 prepend 里, message list 类似 [prepend(userContext), Q1, A1, Q2, A2, Q3]
    • 每次用户发送请求时都会构造 prepend, 为了保持 prefix 稳定, prepend 会直接读缓存. 当 prepend 缓存被清理或者发生 compaction 等动作时 prepend 才会重新算
    • 然后主 agent 根据 MEMORY.md 提供的细分路径直接 read
  • 一条偏新路径: 按 query 做 relevant-memory prefetch
    • 每次 query 时发个 side query (模型用 sonnet) 去 prefetch 相关 memory. 根据每个 memory 文件的 frontmatter 最多召回 5 个文件. 如果召回到了, 就在当前轮次 tool result 返回完后插入到 message list 中 (作为 attachment -> meta user message)

主 agent prompt

export const WHEN_TO_ACCESS_SECTION: readonly string[] = [
  '## When to access memories',
  '- When memories seem relevant, or the user references prior-conversation work.',
  '- You MUST access memory when the user explicitly asks you to check, recall, or remember.',
  '- If the user says to *ignore* or *not use* memory: proceed as if MEMORY.md were empty.',
]

把 memory 定位成“历史快照”, 而不是 ground truth.

export const TRUSTING_RECALL_SECTION: readonly string[] = [
  '## Before recommending from memory',
  'A memory that names a specific function, file, or flag is a claim that it existed *when the memory was written*.',
  '- If the memory names a file path: check the file exists.',
  '- If the memory names a function or flag: grep for it.',
]

搜索过去的 context

'## Searching past context',
'1. Search topic files in your memory directory:',
memSearch,
'2. Session transcript logs (last resort — large files, slow):',
transcriptSearch,

触发 side query prefetch

每个用户 turn 最多发起一次 prefetch, 不阻塞主回答.

query.ts 的注释:

// Fired once per user turn — the prompt is invariant across loop iterations,
// so per-iteration firing would ask sideQuery the same question N times.
using pendingMemoryPrefetch = startRelevantMemoryPrefetch(
  state.messages,
  state.toolUseContext,
)

触发需要前置条件:

  • 单词太少不查
  • 当前 session 已经 surfacing 的 memory 太多也不查
const input = getUserMessageText(lastUserMessage)
// Single-word prompts lack enough context for meaningful term extraction
if (!input || !/\\s/.test(input.trim())) {
  return undefined
}
const surfaced = collectSurfacedMemories(messages)
if (surfaced.totalBytes >= RELEVANT_MEMORIES_CONFIG.MAX_SESSION_BYTES) {
  return undefined
}

Prompt

You are selecting memories that will be useful to Claude Code as it processes a user's query. You will be given the user's query and a list of available memory files with their filenames and descriptions.

Return a list of filenames for the memories that will clearly be useful to Claude Code as it processes the user's query (up to 5). Only include memories that you are certain will be helpful based on their name and description.
- If you are unsure if a memory will be useful in processing the user's query, then do not include it in your list. Be selective and discerning.
- If there are no memories in the list that would clearly be useful, feel free to return an empty list.
- If a list of recently-used tools is provided, do not select memories that are usage reference or API documentation for those tools (Claude Code is already exercising them). DO still select memories containing warnings, gotchas, or known issues about those tools  active use is exactly when those matter.
Query: ${query}
Available memories:
${manifest}${toolsSection}

Auto-dream

// Background memory consolidation. Fires the /dream prompt as a forked
// subagent when time-gate passes AND enough sessions have accumulated.
//
// Gate order (cheapest first):
//   1. Time: hours since lastConsolidatedAt >= minHours (one stat)
//   2. Sessions: transcript count with mtime > lastConsolidatedAt >= minSessions
//   3. Lock: no other process mid-consolidation

它也是 forked agent. 触发时机:

  • 距离上次 consolidation 至少 24 小时
  • 期间至少新增了 5 个 session
const DEFAULTS: AutoDreamConfig = {
  minHours: 24,
  minSessions: 5,
}

而且它会排除当前 session:

// Exclude the current session (its mtime is always recent).
const currentSession = getSessionId()
sessionIds = sessionIds.filter(id => id !== currentSession)

Prompt

export function buildConsolidationPrompt(
  memoryRoot: string,
  transcriptDir: string,
  extra: string,
): string {
  return `# Dream: Memory Consolidation

You are performing a dream — a reflective pass over your memory files. Synthesize what you've learned recently into durable, well-organized memories so that future sessions can orient quickly.

Memory directory: \`${memoryRoot}\`
${DIR_EXISTS_GUIDANCE}

Session transcripts: \`${transcriptDir}\` (large JSONL files — grep narrowly, don't read whole files)

---

## Phase 1 — Orient

- \`ls\` the memory directory to see what already exists
- Read \`${ENTRYPOINT_NAME}\` to understand the current index
- Skim existing topic files so you improve them rather than creating duplicates
- If \`logs/\` or \`sessions/\` subdirectories exist (assistant-mode layout), review recent entries there

## Phase 2 — Gather recent signal

Look for new information worth persisting. Sources in rough priority order:

1. **Daily logs** (\`logs/YYYY/MM/YYYY-MM-DD.md\`) if present — these are the append-only stream
2. **Existing memories that drifted** — facts that contradict something you see in the codebase now
3. **Transcript search** — if you need specific context (e.g., "what was the error message from yesterday's build failure?"), grep the JSONL transcripts for narrow terms:
   \`grep -rn "<narrow term>" ${transcriptDir}/ --include="*.jsonl" | tail -50\`

Don't exhaustively read transcripts. Look only for things you already suspect matter.

## Phase 3 — Consolidate

For each thing worth remembering, write or update a memory file at the top level of the memory directory. Use the memory file format and type conventions from your system prompt's auto-memory section — it's the source of truth for what to save, how to structure it, and what NOT to save.

Focus on:
- Merging new signal into existing topic files rather than creating near-duplicates
- Converting relative dates ("yesterday", "last week") to absolute dates so they remain interpretable after time passes
- Deleting contradicted facts — if today's investigation disproves an old memory, fix it at the source

## Phase 4 — Prune and index

Update \`${ENTRYPOINT_NAME}\` so it stays under ${MAX_ENTRYPOINT_LINES} lines AND under ~25KB. It's an **index**, not a dump — each entry should be one line under ~150 characters: \`- [Title](file.md) — one-line hook\`. Never write memory content directly into it.

- Remove pointers to memories that are now stale, wrong, or superseded
- Demote verbose entries: if an index line is over ~200 chars, it's carrying content that belongs in the topic file — shorten the line, move the detail
- Add pointers to newly important memories
- Resolve contradictions — if two files disagree, fix the wrong one

---

Return a brief summary of what you consolidated, updated, or pruned. If nothing changed (memories are already tight), say so.${extra ? `\n\n## Additional context\n\n${extra}` : ''}`
}

读 Claude Code 源码 - Web Search & Web Fetch

作者 Shiina
2026年4月5日 08:00

WebSearch 调用服务端的搜索工具, WebFetch 本地抓 URL、HTML 转 markdown、再交给一个小模型按 prompt 提炼.

WebSearch

Schema

const inputSchema = lazySchema(() =>
  z.strictObject({
    query: z.string().min(2).describe('The search query to use'),
    allowed_domains: z
      .array(z.string())
      .optional()
      .describe('Only include search results from these domains'),
    blocked_domains: z
      .array(z.string())
      .optional()
      .describe('Never include search results from these domains'),
  }),
)

禁止同时传 allowed_domainsblocked_domains.

'Error: Cannot specify both allowed_domains and blocked_domains in the same request'

The max_uses parameter limits the number of searches performed. If Claude attempts more searches than allowed, the web_search_tool_result is an error with the max_uses_exceeded error code.

function makeToolSchema(input: Input): BetaWebSearchTool20250305 {
  return {
    type: 'web_search_20250305',
    name: 'web_search',
    allowed_domains: input.allowed_domains,
    blocked_domains: input.blocked_domains,
    max_uses: 8, // Hardcoded to 8 searches maximum
  }
}

从结果解析那段注释来看, 设计上就是允许同一次 WebSearch 里出现多轮 search, 限制 max_uses 轮次.

  // The result is a sequence of these blocks:
  // - text to start -- always?
  // [
  //    - server_tool_use
  //    - web_search_tool_result
  //    - text and citation blocks intermingled
  //  ]+  (this block repeated for each search)

Prompt

- Allows Claude to search the web and use the results to inform responses
- Provides up-to-date information for current events and recent data
- Returns search result information formatted as search result blocks, including links as markdown hyperlinks
- Use this tool for accessing information beyond Claude's knowledge cutoff
- Searches are performed automatically within a single API call

回答之后的引用格式.

CRITICAL REQUIREMENT - You MUST follow this:
  - After answering the user's question, you MUST include a "Sources:" section at the end of your response
  - In the Sources section, list all relevant URLs from the search results as markdown hyperlinks: [Title](URL)
  - This is MANDATORY - never skip including sources in your response
  - Example format:

    [Your answer here]

    Sources:
    - [Source Title 1](https://example.com/1)
    - [Source Title 2](https://example.com/2)
Usage notes:
  - Domain filtering is supported to include or block specific websites
  - Web search is only available in the US

时间约束不是通过类似 time filter / recency 这样的参数, 而是 prompt 引导在搜索 query 中加上年份 (比如 “XXX 文档 2026”).

IMPORTANT - Use the correct year in search queries:
  - The current month is ${currentMonthYear}. You MUST use this year when searching for recent information, documentation, or current events.
  - Example: If the user asks for "latest React docs", search for "React documentation" with the current year, NOT last year

拿到链接拼回 tool result 后再次提醒.

'\nREMINDER: You MUST include the sources above in your response to the user using markdown hyperlinks.'

WebFetch

const inputSchema = lazySchema(() =>
  z.strictObject({
    url: z.string().url().describe('The URL to fetch content from'),
    prompt: z.string().describe('The prompt to run on the fetched content'),
  }),
)
const outputSchema = lazySchema(() =>
  z.object({
    bytes: z.number().describe('Size of the fetched content in bytes'),
    code: z.number().describe('HTTP response code'),
    codeText: z.string().describe('HTTP response code text'),
    result: z
      .string()
      .describe('Processed result from applying the prompt to the content'),
    durationMs: z
      .number()
      .describe('Time taken to fetch and process the content'),
    url: z.string().describe('The URL that was fetched'),
  }),
)

抓网页, 转 markdown, 传 prompt 指导小模型提取信息.

优先用 MCP (比如智谱 coding plan 就是提供的自家的 web fetch MCP), 对 Github 优先用 gh CLI.

IMPORTANT: WebFetch WILL FAIL for authenticated or private URLs. Before using this tool, check if the URL points to an authenticated service (e.g. Google Docs, Confluence, Jira, GitHub). If so, look for a specialized MCP tool that provides authenticated access.

- Fetches content from a specified URL and processes it using an AI model
- Takes a URL and a prompt as input
- Fetches the URL content, converts HTML to markdown
- Processes the content with the prompt using a small, fast model
- Returns the model's response about the content
- Use this tool when you need to retrieve and analyze web content

Usage notes:
  - IMPORTANT: If an MCP-provided web fetch tool is available, prefer using that tool instead of this one, as it may have fewer restrictions.
  - The URL must be a fully-formed valid URL
  - HTTP URLs will be automatically upgraded to HTTPS
  - The prompt should describe what information you want to extract from the page
  - This tool is read-only and does not modify any files
  - Results may be summarized if the content is very large
  - Includes a self-cleaning 15-minute cache for faster responses when repeatedly accessing the same URL
  - When a URL redirects to a different host, the tool will inform you and provide the redirect URL in a special format. You should then make a new WebFetch request with the redirect URL to fetch the content.
  - For GitHub URLs, prefer using the gh CLI via Bash instead (e.g., gh pr view, gh issue view, gh api).

权限和安全策略

有一层“预批准域名”src\tools\WebFetchTool\preapproved.ts:5 是源码里内置的一份 host 白名单, 命中之后会直接 allow.

// For legal and security concerns, we typically only allow Web Fetch to access
// domains that the user has provided in some form. However, we make an
// exception for a list of preapproved domains that are code-related.
//
// SECURITY WARNING: These preapproved domains are ONLY for WebFetch (GET requests only).
// The sandbox system deliberately does NOT inherit this list for network restrictions,
// as arbitrary network access (POST, uploads, etc.) to these domains could enable
// data exfiltration. Some domains like huggingface.co, kaggle.com, and nuget.org
// allow file uploads and would be dangerous for unrestricted network access.
//
// See test/utils/sandbox/webfetch-preapproved-separation.test.ts for verification
// that sandbox network restrictions require explicit user permission rules.

export const PREAPPROVED_HOSTS = new Set([
  // Anthropic
  'platform.claude.com',
  'code.claude.com',
  'modelcontextprotocol.io',
  'github.com/anthropics',
  'agentskills.io',

  // Top Programming Languages
  'docs.python.org', // Python
  'en.cppreference.com', // C/C++ reference
  
  ...
  
  // Other Essential Tools
  'git-scm.com', // Git
  'nginx.org', // Nginx
  'httpd.apache.org', // Apache HTTP Server
])

拉取链路

缓存

// Cache with 15-minute TTL and 50MB size limit
// LRUCache handles automatic expiration and eviction

域名预检缓存

// Separate cache for preflight domain checks. URL_CACHE is URL-keyed, so
// fetching two paths on the same domain triggers two identical preflight
// HTTP round-trips to api.anthropic.com. This hostname-keyed cache avoids
// that. Only 'allowed' is cached — blocked/failed re-check on next attempt.
  • 页面内容缓存按 URL
  • 安全预检缓存按 hostname
  • 而且只缓存 allowed, 不缓存失败态, 因为 blocked / failed 可能是暂时性的, 下次重试未必还是一样.

资源限制:

// "Implement resource consumption controls because setting limits on CPU,
// memory, and network usage for the Web Fetch tool can prevent a single
// request or user from overwhelming the system."
const MAX_HTTP_CONTENT_LENGTH = 10 * 1024 * 1024

const FETCH_TIMEOUT_MS = 60_000
const DOMAIN_CHECK_TIMEOUT_MS = 10_000
const MAX_REDIRECTS = 10
export const MAX_MARKDOWN_LENGTH = 100_000

自己接管 redirect.

 * "Do not automatically follow redirects because following redirects could
 * allow for an attacker to exploit an open redirect vulnerability in a
 * trusted domain to force a user to make a request to a malicious domain
 * unknowingly"

只允许很有限的跳转:

  • 同 origin 改 path / query
  • www. 的增减
  • 其他跨 host redirect 不自动跟

如果真跳到别的 host, WebFetchTool 不会偷偷跟过去, 而是返回一个特殊结果, 明确告诉模型“请用新 URL 和 prompt 再调一次”.

    // Check if we got a redirect to a different host
    if ('type' in response && response.type === 'redirect') {
      const statusText =
        response.statusCode === 301
          ? 'Moved Permanently'
          : response.statusCode === 308
            ? 'Permanent Redirect'
            : response.statusCode === 307
              ? 'Temporary Redirect'
              : 'Found'

      const message = `REDIRECT DETECTED: The URL redirects to a different host.

Original URL: ${response.originalUrl}
Redirect URL: ${response.redirectUrl}
Status: ${response.statusCode} ${statusText}

To complete your request, I need to fetch content from the redirected URL. Please use WebFetch again with these parameters:
- url: "${response.redirectUrl}"
- prompt: "${prompt}"`

内容处理

抓到内容后, 区分 HTML 和非 HTML. 用现成的 turndown 库把 HTML 转成 markdown

// Lazy singleton — defers the turndown → @mixmark-io/domino import (~1.4MB
// retained heap) until the first HTML fetch, and reuses one instance across
// calls
if (contentType.includes('text/html')) {
  markdownContent = (await getTurndownService()).turndown(htmlContent)
}

二进制内容也没简单丢掉

// Binary content: save raw bytes to disk with a proper extension so Claude
// can inspect the file later. We still fall through to the utf-8 decode +
// Haiku path below — for PDFs in particular the decoded string has enough
// ASCII structure (/Title, text streams) that Haiku can summarize it

加速

WebFetch 加速点主要有五个.

第一层是 URL 缓存和域名预检缓存, 上面已经说了.

第二层是“可信内容直出”:

  • 如果 URL 属于 preapproved domain
  • content-typetext/markdown
  • 且长度小于 MAX_MARKDOWN_LENGTH

那么它直接把 markdown 原文返回, 不再过二级模型. 也就是说, 对官方文档站这类最常见场景, 它直接省掉一次 Haiku 调用.

第三层是内容截断:

// Truncate content to avoid "Prompt is too long" errors from the secondary model

这虽然看起来只是防报错, 但本质上也是 latency 控制: 不让二级模型吃超长网页.

第四层是二级模型本身就选了快模型 (haiku).

- Processes the content with the prompt using a small, fast model

第五层是 prompt 也按站点信任级别分流. 对预批准文档站, 指令比较宽松, 对普通网站则更保守. 这里的 prompt 参数是调 web fetch 工具时传的 prompt 参数.

export function makeSecondaryModelPrompt(
  markdownContent: string,
  prompt: string,
  isPreapprovedDomain: boolean,
): string {
  const guidelines = isPreapprovedDomain
    ? `Provide a concise response based on the content above. Include relevant details, code examples, and documentation excerpts as needed.`
    : `Provide a concise response based only on the content above. In your response:
 - Enforce a strict 125-character maximum for quotes from any source document. Open Source Software is ok as long as we respect the license.
 - Use quotation marks for exact language from articles; any language outside of the quotation should never be word-for-word the same.
 - You are not a lawyer and never comment on the legality of your own prompts and responses.
 - Never produce or reproduce exact song lyrics.`

  return `
Web page content:
---
${markdownContent}
---

${prompt}

${guidelines}
`
}

读 Claude Code 源码 - memory 机制

作者 Shiina
2026年4月3日 08:00

分为 Session MemoryAuto Memory (跨 session).

Session Memory

每个 session 维护一个 summary.md 文件, 后台用 forked subagent (并且限制权限只能编辑这个 summary.md 文件) 达到条件 (距离上次更新后新增 token 数以及 tool calls 数等) 后更新.

/**
 * Session Memory automatically maintains a markdown file with notes about the current conversation.
 * It runs periodically in the background using a forked subagent to extract key information
 * without interrupting the main conversation flow.
 */
/**
 * Configuration for session memory extraction thresholds
 */
export type SessionMemoryConfig = {
  /** Minimum context window tokens before initializing session memory.
   * Uses the same token counting as autocompact (input + output + cache tokens)
   * to ensure consistent behavior between the two features. */
  minimumMessageTokensToInit: number
  /** Minimum context window growth (in tokens) between session memory updates.
   * Uses the same token counting as autocompact (tokenCountWithEstimation)
   * to measure actual context growth, not cumulative API usage. */
  minimumTokensBetweenUpdate: number
  /** Number of tool calls between session memory updates */
  toolCallsBetweenUpdates: number
}
export const DEFAULT_SESSION_MEMORY_CONFIG: SessionMemoryConfig = {
  minimumMessageTokensToInit: 10000,
  minimumTokensBetweenUpdate: 5000,
  toolCallsBetweenUpdates: 3,
}
  // Trigger extraction when:
  // 1. Both thresholds are met (tokens AND tool calls), OR
  // 2. No tool calls in last turn AND token threshold is met
  //    (to ensure we extract at natural conversation breaks)
  //
  // IMPORTANT: The token threshold (minimumTokensBetweenUpdate) is ALWAYS required.
  // Even if the tool call threshold is met, extraction won't happen until the
  // token threshold is also satisfied. This prevents excessive extractions.
/**
 * Returns the session memory directory path for the current session with trailing separator.
 * Path format: {projectDir}/{sessionId}/session-memory/
 */
export function getSessionMemoryDir(): string {
  return join(getProjectDir(getCwd()), getSessionId(), 'session-memory') + sep
}

/**
 * Returns the session memory file path for the current session.
 * Path format: {projectDir}/{sessionId}/session-memory/summary.md
 */
export function getSessionMemoryPath(): string {
  return join(getSessionMemoryDir(), 'summary.md')
}

Prompt 模板:

# Session Title
_A short and distinctive 5-10 word descriptive title for the session. Super info dense, no filler_

# Current State
_What is actively being worked on right now? Pending tasks not yet completed. Immediate next steps._

# Task specification
_What did the user ask to build? Any design decisions or other explanatory context_

后面还有:

  • Files and Functions
  • Workflow
  • Errors & Corrections
  • Codebase and System Documentation
  • Learnings
  • Key results
  • Worklog
- The file must maintain its exact structure with all sections, headers, and italic descriptions intact
-- NEVER modify, delete, or add section headers
-- NEVER modify or delete the italic _section description_ lines
-- ONLY update the actual content that appears BELOW the italic _section descriptions_

更新动作由一个 forked subagent 执行.

  // Run session memory extraction using runForkedAgent for prompt caching
  // runForkedAgent creates an isolated context to prevent mutation of parent state
  // Pass setupContext.readFileState so the forked agent can edit the memory file
/**
 * Creates a canUseTool function that only allows Edit for the exact memory file.
 */
if (
  tool.name === FILE_EDIT_TOOL_NAME &&
  typeof input === 'object' &&
  input !== null &&
  'file_path' in input
) {
  const filePath = input.file_path
  if (typeof filePath === 'string' && filePath === memoryPath) {
    return { behavior: 'allow' as const, updatedInput: input }
  }
}

用途: 给 autocompact 提前备料

一旦满足 autocompact 条件, 先试 session memory compaction.

  // EXPERIMENT: Try session memory compaction first
  const sessionMemoryResult = await trySessionMemoryCompaction(
    messages,
    toolUseContext.agentId,
    recompactionInfo.autoCompactThreshold,
  )
/**
 * Try to use session memory for compaction instead of traditional compaction.
 * Returns null if session memory compaction cannot be used.
 *
 * Handles two scenarios:
 * 1. Normal case: lastSummarizedMessageId is set, keep only messages after that ID
 * 2. Resumed session: lastSummarizedMessageId is not set but session memory has content,
 *    keep all messages but use session memory as the summary
 */

保留最近一段原始消息

/**
 * Calculate the starting index for messages to keep after compaction.
 * Starts from lastSummarizedMessageId, then expands backwards to meet minimums:
 * - At least config.minTokens tokens
 * - At least config.minTextBlockMessages messages with text blocks
 * Stops expanding if config.maxTokens is reached.
 * Also ensures tool_use/tool_result pairs are not split.
 */
export const DEFAULT_SM_COMPACT_CONFIG: SessionMemoryCompactConfig = {
  minTokens: 10_000,
  minTextBlockMessages: 5,
  maxTokens: 40_000,
}

避免切断 tool use 和 tool result 配对.

 * Adjust the start index to ensure we don't split tool_use/tool_result pairs
 * or thinking blocks that share the same message.id with kept assistant messages.
 *
 * If ANY message we're keeping contains tool_result blocks, we need to
 * include the preceding assistant message(s) that contain the matching tool_use blocks.
 * API error: orphan tool_result references non-existent tool_use

预算控制, 避免 summary.md 本身太大.

const MAX_SECTION_LENGTH = 2000
const MAX_TOTAL_SESSION_MEMORY_TOKENS = 12000

更新 prompt 里明确提醒

- Keep each section under ~${MAX_SECTION_LENGTH} tokens/words
- IMPORTANT: Always update "Current State" to reflect the most recent work

如果总量已经超了, 追加压缩提醒

CRITICAL: The session memory file is currently ~${totalTokens} tokens, which exceeds the maximum of ${MAX_TOTAL_SESSION_MEMORY_TOKENS} tokens. You MUST condense the file to fit within this budget.

进入 compact 流程时, 也会再做一次截断保护

/**
 * Truncate session memory sections that exceed the per-section token limit.
 * Used when inserting session memory into compact messages to prevent
 * oversized session memory from consuming the entire post-compact token budget.
 */

另一个用途: 给 skillify 提供会话背景

把当前 session 炼成 skill.

## Your Session Context

Here is the session memory summary:
<session_memory>

</session_memory>
const sessionMemory =
  (await getSessionMemoryContent()) ?? 'No session memory available.'

然后它再把本次 session 的 user messages 补进去

const userMessages = extractUserMessages(
  getMessagesAfterCompactBoundary(context.messages),
)

Auto Memory

需要打开 feature, 见 文档. 跨 session, 支持召回.

/**
 * Whether auto-memory features are enabled (memdir, agent memory, past session search).
 */

MEMORY.md 是入口索引

const AUTO_MEM_DIRNAME = 'memory'
const AUTO_MEM_ENTRYPOINT_NAME = 'MEMORY.md'

默认目录解析

/**
 * Returns the auto-memory directory path.
 *
 * Resolution order:
 *   1. CLAUDE_COWORK_MEMORY_PATH_OVERRIDE env var
 *   2. autoMemoryDirectory in settings.json
 *   3. <memoryBase>/projects/<sanitized-git-root>/memory/
 */

这里的 MEMORY.md 负责做入口索引, durable memory 写进各自的 topic file.

`MEMORY.md` is an index, not a memory
each entry should be one line, under ~150 characters
Saving a memory is a two-step process:

**Step 1**  write the memory to its own file
**Step 2**  add a pointer to that file in `MEMORY.md`
/**
 * Extracts durable memories from the current session transcript
 * and writes them to the auto-memory directory (~/.claude/projects/<path>/memory/).
 *
 * It runs once at the end of each complete query loop
 */

MEMORY.md 会被自动注入上下文

" (user's auto-memory, persists across conversations)"

支持按 query 检索 (通过读 topic files 的 frontmatter) 相关 memories

/**
 * Find memory files relevant to a query by scanning memory file headers
 * and asking Sonnet to select the most relevant ones.
 *
 * Returns absolute file paths + mtime of the most relevant memories
 * (up to 5). Excludes MEMORY.md (already loaded in system prompt).
 */

读 Claude Code 源码 - 上下文压缩策略

作者 Shiina
2026年4月1日 08:00

若干层压缩.

压缩 tool result

在 message 中, assistant message 包含模型回答以及 tool_use 等, 而 user message 包含用户输入以及 tool_result 等. 如果当前 user message group (把 assistant message 作为边界, 两个 assistant message 中间所有的 user messages 全部合并起来算) 中可压缩的 tool result 总字符数超过 MAX_TOOL_RESULTS_PER_MESSAGE_CHARS, 就迭代地把最长的 tool_result 替换成 preview (包含原始 tool result 的本地持久化路径, 已经开头若干字符的预览) 直到总长度小于上限.

对 tool use id, 内部维护变量 seenIds (set, 表示已经见过的 tool use id) 和 replacements (dict, 记录被压缩的 tool use id 到 preview 的映射). 之后发送 llm api 调用前, 对于已经处理过的 tool use id 都会按照之前同样的处理 (压缩过的就重新替换成一样的 preview, 没压缩过的依旧不压缩) 避免破坏 prompt cache.

 * Enforce the per-message budget on aggregate tool result size.
 *
 * For each user message whose tool_result blocks together exceed the
 * per-message limit (see getPerMessageBudgetLimit), the largest FRESH
 * (never-before-seen) results in THAT message are persisted to disk and
 * replaced with previews.
 * Messages are evaluated independently  a 150K result in one message and
 * a 150K result in another are both under budget and untouched.
 *
 * State is tracked by tool_use_id in `state`. Once a result is seen its
 * fate is frozen: previously-replaced results get the same replacement
 * re-applied every turn from the cached preview string (zero I/O,
 * byte-identical), and previously-unreplaced results are never replaced
 * later (would break prompt cache).
 *
 * Each turn adds at most one new user message with tool_result blocks,
 * so the per-message loop typically does the budget check at most once;
 * all prior messages just re-apply cached replacements.
/**
 * Default maximum aggregate size in characters for tool_result blocks within
 * a SINGLE user message (one turn's batch of parallel tool results). When a
 * message's blocks together exceed this, the largest blocks in that message
 * are persisted to disk and replaced with previews until under budget.
 * Messages are evaluated independently — a 150K result in one turn and a
 * 150K result in the next are both untouched.
 *
 * This prevents N parallel tools from each hitting the per-tool max and
 * collectively producing e.g. 10 × 40K = 400K in one turn's user message.
 */
export const MAX_TOOL_RESULTS_PER_MESSAGE_CHARS = 200_000
/**
 * Build a message for large tool results with preview
 */
export function buildLargeToolResultMessage(
  result: PersistedToolResult,
): string {
  let message = `${PERSISTED_OUTPUT_TAG}\n`
  message += `Output too large (${formatFileSize(result.originalSize)}). Full output saved to: ${result.filepath}\n\n`
  message += `Preview (first ${formatFileSize(PREVIEW_SIZE_BYTES)}):\n`
  message += result.preview
  message += result.hasMore ? '\n...\n' : '\n'
  message += PERSISTED_OUTPUT_CLOSING_TAG
  return message
}
/**
 * Extract candidate tool_result blocks grouped by API-level user message.
 *
 * normalizeMessagesForAPI merges consecutive user messages into one
 * (Bedrock compat; 1P does the same server-side), so parallel tool
 * results that arrive as N separate user messages in our state become
 * ONE user message on the wire. The budget must group the same way or
 * it would see N under-budget messages instead of one over-budget
 * message and fail to enforce exactly when it matters most.
 *
 * A "group" is a maximal run of user messages NOT separated by an
 * assistant message. Only assistant messages create wire-level
 * boundaries — normalizeMessagesForAPI filters out progress entirely
 * and merges attachment / system(local_command) INTO adjacent user
 * blocks, so those types do NOT break groups here either.
 *
 * This matters for abort-during-parallel-tools paths: agent_progress
 * messages (non-ephemeral, persisted in REPL state) can interleave
 * between fresh tool_result messages. If we flushed on progress, those
 * tool_results would split into under-budget groups, slip through
 * unreplaced, get frozen, then be merged by normalizeMessagesForAPI
 * into one over-budget wire message — defeating the feature.
 *
 * Only groups with at least one eligible candidate are returned.
 */
function collectCandidatesByMessage(
  messages: Message[],
): ToolResultCandidate[][]
...
/**
 * Extract candidate tool_result blocks from a single user message: blocks
 * that are non-empty, non-image, and not already compacted by tag (i.e. by
 * the per-tool limit, or an earlier iteration of this same query call).
 * Returns [] for messages with no eligible blocks.
 */
function collectCandidatesFromMessage(message: Message): ToolResultCandidate[] ...

后面有 snip compact 环节, 但因为代码缺失从略.

Micro-compact

  • 如果发送请求时距离上一个请求过了太久, 服务端的 prompt cache 早就失效了, 干脆把旧的 tool results 清理掉 (替换成 TIME_BASED_MC_CLEARED_MESSAGE) 再发请求 (保留最近 keepRecent 个, 其他压缩).
  • 如果 prompt cache 还在, 就用 Anthropic 特有的 context editing 接口压缩 tool results.
  // Time-based trigger runs first and short-circuits. If the gap since the
  // last assistant message exceeds the threshold, the server cache has expired
  // and the full prefix will be rewritten regardless — so content-clear old
  // tool results now, before the request, to shrink what gets rewritten.
  // Cached MC (cache-editing) is skipped when this fires: editing assumes a
  // warm cache, and we just established it's cold.
  const compactableIds = collectCompactableToolIds(messages)

  // Floor at 1: slice(-0) returns the full array (paradoxically keeps
  // everything), and clearing ALL results leaves the model with zero working
  // context. Neither degenerate is sensible — always keep at least the last.
  const keepRecent = Math.max(1, config.keepRecent)
  const keepSet = new Set(compactableIds.slice(-keepRecent))
  const clearSet = new Set(compactableIds.filter(id => !keepSet.has(id)))
// Inline from utils/toolResultStorage.ts — importing that file pulls in
// sessionStorage → utils/messages → services/api/errors, completing a
// circular-deps loop back through this file via promptCacheBreakDetection.
// Drift is caught by a test asserting equality with the source-of-truth.
export const TIME_BASED_MC_CLEARED_MESSAGE = '[Old tool result content cleared]'

// Only compact these tools
const COMPACTABLE_TOOLS = new Set<string>([
  FILE_READ_TOOL_NAME,
  ...SHELL_TOOL_NAMES,
  GREP_TOOL_NAME,
  GLOB_TOOL_NAME,
  WEB_SEARCH_TOOL_NAME,
  WEB_FETCH_TOOL_NAME,
  FILE_EDIT_TOOL_NAME,
  FILE_WRITE_TOOL_NAME,
])
/**
 * Cached microcompact path - uses cache editing API to remove tool results
 * without invalidating the cached prefix.
 *
 * Key differences from regular microcompact:
 * - Does NOT modify local message content (cache_reference and cache_edits are added at API layer)
 * - Uses count-based trigger/keep thresholds from GrowthBook config
 * - Takes precedence over regular microcompact (no disk persistence)
 * - Tracks tool results and queues cache edits for the API layer
 */

之后有 context collapse 环节, 因为代码缺失从略.

Auto-compact

如果上下文大于阈值, 会先 trySessionMemoryCompaction (TODO: 还没看), 如果失败或不适用再构造 prompt 专门调一次 llm 总结 (会先尝试 fork agent 以吃到现有的 prompt cache). 连续失败 3 次则不再同个 session 尝试 autocompact.

  // When prompt cache sharing is enabled, use forked agent to reuse the
  // main conversation's cached prefix (system prompt, tools, context messages).
  // Falls back to regular streaming path on failure.
export async function runForkedAgent(...):{
...
  const initialMessages: Message[] = [...forkContextMessages, ...promptMessages]
...
// Reserve this many tokens for output during compaction
// Based on p99.99 of compact summary output being 17,387 tokens.
const MAX_OUTPUT_TOKENS_FOR_SUMMARY = 20_000

export function getEffectiveContextWindowSize(model: string): number {
  const reservedTokensForSummary = Math.min(
    getMaxOutputTokensForModel(model),
    MAX_OUTPUT_TOKENS_FOR_SUMMARY,
  )
  let contextWindow = getContextWindowForModel(model, getSdkBetas())
  ...
  return contextWindow - reservedTokensForSummary
}

export const AUTOCOMPACT_BUFFER_TOKENS = 13_000

  const autocompactThreshold =
    effectiveContextWindow - AUTOCOMPACT_BUFFER_TOKENS
export function getCompactPrompt(customInstructions?: string): string {
  let prompt = NO_TOOLS_PREAMBLE + BASE_COMPACT_PROMPT
  if (customInstructions && customInstructions.trim() !== '') {
    prompt += `\n\nAdditional Instructions:\n${customInstructions}`
  }
  prompt += NO_TOOLS_TRAILER
  return prompt
}
// Stop trying autocompact after this many consecutive failures.
// BQ 2026-03-10: 1,279 sessions had 50+ consecutive failures (up to 3,272)
// in a single session, wasting ~250K API calls/day globally.
const MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3
    // Increment consecutive failure count for circuit breaker.
    // The caller threads this through autoCompactTracking so the
    // next query loop iteration can skip futile retry attempts.
    const prevFailures = tracking?.consecutiveFailures ?? 0
    const nextFailures = prevFailures + 1
    if (nextFailures >= MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES) {
      logForDebugging(
        `autocompact: circuit breaker tripped after ${nextFailures} consecutive failures — skipping future attempts this session`,
        { level: 'warn' },
      )
    }
// Aggressive no-tools preamble. The cache-sharing fork path inherits the
// parent's full tool set (required for cache-key match), and on Sonnet 4.6+
// adaptive-thinking models the model sometimes attempts a tool call despite
// the weaker trailer instruction. With maxTurns: 1, a denied tool call means
// no text output → falls through to the streaming fallback (2.79% on 4.6 vs
// 0.01% on 4.5). Putting this FIRST and making it explicit about rejection
// consequences prevents the wasted turn.
const NO_TOOLS_PREAMBLE = `CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.

- Do NOT use Read, Bash, Grep, Glob, Edit, Write, or ANY other tool.
- You already have all the context you need in the conversation above.
- Tool calls will be REJECTED and will waste your only turn — you will fail the task.
- Your entire response must be plain text: an <analysis> block followed by a <summary> block.

`

const NO_TOOLS_TRAILER =
  '\n\nREMINDER: Do NOT call any tools. Respond with plain text only — ' +
  'an <analysis> block followed by a <summary> block. ' +
  'Tool calls will be rejected and you will fail the task.'
  

const BASE_COMPACT_PROMPT = `Your task is to create a detailed summary of the conversation so far, paying close attention to the user's explicit requests and your previous actions.
This summary should be thorough in capturing technical details, code patterns, and architectural decisions that would be essential for continuing development work without losing context.

${DETAILED_ANALYSIS_INSTRUCTION_BASE}

Your summary should include the following sections:

1. Primary Request and Intent: Capture all of the user's explicit requests and intents in detail
2. Key Technical Concepts: List all important technical concepts, technologies, and frameworks discussed.
3. Files and Code Sections: Enumerate specific files and code sections examined, modified, or created. Pay special attention to the most recent messages and include full code snippets where applicable and include a summary of why this file read or edit is important.
4. Errors and fixes: List all errors that you ran into, and how you fixed them. Pay special attention to specific user feedback that you received, especially if the user told you to do something differently.
5. Problem Solving: Document problems solved and any ongoing troubleshooting efforts.
6. All user messages: List ALL user messages that are not tool results. These are critical for understanding the users' feedback and changing intent.
7. Pending Tasks: Outline any pending tasks that you have explicitly been asked to work on.
8. Current Work: Describe in detail precisely what was being worked on immediately before this summary request, paying special attention to the most recent messages from both user and assistant. Include file names and code snippets where applicable.
9. Optional Next Step: List the next step that you will take that is related to the most recent work you were doing. IMPORTANT: ensure that this step is DIRECTLY in line with the user's most recent explicit requests, and the task you were working on immediately before this summary request. If your last task was concluded, then only list next steps if they are explicitly in line with the users request. Do not start on tangential requests or really old requests that were already completed without confirming with the user first.
                       If there is a next step, include direct quotes from the most recent conversation showing exactly what task you were working on and where you left off. This should be verbatim to ensure there's no drift in task interpretation.

Here's an example of how your output should be structured:

<example>
<analysis>
[Your thought process, ensuring all points are covered thoroughly and accurately]
</analysis>

<summary>
1. Primary Request and Intent:
   [Detailed description]

2. Key Technical Concepts:
   - [Concept 1]
   - [Concept 2]
   - [...]

3. Files and Code Sections:
   - [File Name 1]
      - [Summary of why this file is important]
      - [Summary of the changes made to this file, if any]
      - [Important Code Snippet]
   - [File Name 2]
      - [Important Code Snippet]
   - [...]

4. Errors and fixes:
    - [Detailed description of error 1]:
      - [How you fixed the error]
      - [User feedback on the error if any]
    - [...]

5. Problem Solving:
   [Description of solved problems and ongoing troubleshooting]

6. All user messages: 
    - [Detailed non tool use user message]
    - [...]

7. Pending Tasks:
   - [Task 1]
   - [Task 2]
   - [...]

8. Current Work:
   [Precise description of current work]

9. Optional Next Step:
   [Optional Next step to take]

</summary>
</example>

Please provide your summary based on the conversation so far, following this structure and ensuring precision and thoroughness in your response. 

There may be additional summarization instructions provided in the included context. If so, remember to follow these instructions when creating the above summary. Examples of instructions include:
<example>
## Compact Instructions
When summarizing the conversation focus on typescript code changes and also remember the mistakes you made and how you fixed them.
</example>

<example>
# Summary instructions
When you are using compact - please focus on test output and code changes. Include file reads verbatim.
</example>
`

Agent 实践杂录

作者 Shiina
2025年10月17日 08:00

2025 年大家都忙着搞 agent. 下面分类是随便分的.

Tools/Prompts

Context-Engineering

Compression/Memory/Planning

Skills

Agent Harness

现在趋势是自己不做 index (分块 + 向量化 + 向量数据库), 直接让 LLM grep 或者 web search. 比如 Claude, Cline, manus 都是如此.

Claude Code

Traps

Deep Research

AI Coding

Training: Agentic RL

RAG 简要回顾

作者 Shiina
2025年10月7日 08:00

2025 年大家都忙着开发 agent, 这里简要回顾一下 RAG.

RAG 基本操作

  • Offline: 文件解析, 文本切片, embedding (以前通常用 bge)
  • 对 query embedding 后做召回 (通常就算个 cos, chunk 量大时用向量数据库牺牲一定精度加速召回)
  • Rerank (通常是 bge-reranker)

这套早在 2023 年就玩烂了.

Advanced RAG

Offline

文本切片优化

  • 按照语义切分: 想法是, 先得到句子 embedding, 若相邻句子 embedding 距离较大 (比如可以统计分位数取阈值), 则认为语义差别大, 在这里切分.
  • 按结构切分: 比如根据 markdown 的标题层级, 图表, 代码等, 保证有意义的结构不被切开. 这里可以把 chunk 所属的标题放在其 metadata 里或者直接拼在 chunk 开头; 或者用 LLM 总结 chunk 生成一个 heading 拼上去.
  • 还有很多雕花级别的操作, 可以参考这些 2023 年的 RAG 比赛, B 站上也有答辩视频.

Embedding 优化

给每个 chunk 生成更多 “维度” 的 embedding. 比如对当前 chunk 做个总结得到 embedding, 或者把当前 chunk 对应的 window 更大的 chunk 或者段落以及章节层级拿来做 embedding (甚至是层次化的 embedding 以及召回). 命中 embedding 之后可以连带地在当前 chunk 前后扩展一定 window 或者段落带出更完整连贯的上下文.

Online

Query 处理

  • Query 分类 (意图识别/路由等)
  • 生成更多维度 embedding: 比如 HyDE (Hypothetical Document Embedding), 根据 query 生成伪文档再去召回, 把 qa 匹配变成 aa 匹配. 类似地, 离线时可以对每个 chunk 生成可能的 query, 把 qa 匹配变成 qq 匹配.

拼接上下文

  • 扩大窗口 (之前讲过了, 带出当前 chunk 对应的 window 更大的 chunk)
  • 顺序 (如果 chunk 来自同一篇文档, 按文中出现的顺序排序, 离得近可以补充一些 gap 等让段落更连贯)
  • 根据层级 (之前讲过了, 带出当前 chunk 对应的章节)
  • 压缩 (还是靠 LLM 搞)

评估

  • 召回评估
    • 效果指标: recall@k, precision@k, mAP, mrr 等, 可以参考 这里
    • 性能指标: 平均响应时间, QPS 承载能力, 可用性/节点故障恢复时间
    • 成本指标: 单位向量存储成本, 单位检索成本
    • 在线评估: 检索结果点击率 (CTR), 停留时间 (查看检索结果的时间), 二次检索率 (看了结果后再次检索的比例, 越低越好), 用户满意度评分
  • 生成评估
    • 效果指标: 事实准确率 (回答与检索信息一致), 幻觉率 (回答包含检索信息外内容的比例), 格式符合度, 用户满意度
    • 性能指标: 首 token 时间, QPS, 可用性
    • 成本指标: 单位请求成本 (GPU 资源成本), GPU 利用率

Graph RAG

参考 LightRAG 以及微软的 GraphRAG, 宣传中 Graph RAG 能做这两件事情 (1) 回答全局问题, 比如总结全书; (2) 回答多跳问题.

其中第一点我的理解是, Graph RAG 相当于做了层级 (图的层级聚类) 的摘要, 越往上层级就是摘要的摘要, 所以所谓的能解决全局问题其实是提前通过摘要的摘要把回答准备好了.

至于第二点, 我的理解是不如 agentic RAG. Graph RAG 企图通过图关系, 一步 (虽然后续也有工作是多步迭代式召回) 把多跳关系找全, 很难做好. 构建图谱就不是 trivial 的事情, 光是定义什么东西算个结点都不容易, 实体 (结点) 链接与消歧也不容易. 而召回时需要利用图谱的边, 实际上需要 “预先知道要利用到这类边”, 构图时才能构出来. 构图过程的计算量和存储需求都很大, 后续更新也很难做. 图的构建说到底还是看 LLM 本身能力. 而 agentic rag 允许多次检索, 同样是依赖 LLM 本身能力, 但不需要预先对知识库构建图谱 (所以能直接用上 web search 等更通用的能力). 行动机制上更像人, 也更容易 scaling. 另外可以见 你为什么要用 GraphRAG?

那图谱到底有什么用? 我的理解是可以通过图谱构建 agent 训练数据, 比如 web sailor.

Agentic RAG

其实就是让 LLM 自己取做判断, 比如: 召回文档是否相关? 够不够解决问题? 这个回答有没有乱编? 等等. 一个比较典型的应用是 deep research, 具体就要开另一篇博客了.

其他

备用资料

读代码: Cherry Studio 联网搜

作者 Shiina
2025年9月30日 08:00

非常粗糙.

如果同时开启知识库和联网搜 (searchOrchestrationPlugin.ts), 则用 SEARCH_SUMMARY_PROMPT 做意图分析和 query 改写. 简单地把两种搜索的结果拼接起来 (不会混起来重排), index 加上偏移量避免重叠. 如果设置了召回 memory 也会拼在后面.

联网搜分为两种:

  • 一种是 local search (见 LocalSearchProvider.ts), 直接解析 SERP (比如 https://www.google.com/search?q=%s). 免费.
  • 另一种就是调搜索 API, 比如 Tavily.

访问搜索引擎以及 fetch url 内容都是通过 Electron 在后台打开不可见的浏览器窗口加载指定的 url.

window.api.searchService.openUrlInSearchWindow(uid, url)

类似白嫖搜索引擎的项目还有比如 duckduckgo-mcp-server 以及 open-webSearch. 不清楚是否合规.

Prompts

prompts.ts

// https://github.com/ItzCrazyKns/Perplexica/blob/master/src/lib/prompts/webSearch.ts
export const SEARCH_SUMMARY_PROMPT = `
  You are an AI question rephraser. Your role is to rephrase follow-up queries from a conversation into standalone queries that can be used by another LLM to retrieve information, either through web search or from a knowledge base.
  **Use user's language to rephrase the question.**
  Follow these guidelines:
  1. If the question is a simple writing task, greeting (e.g., Hi, Hello, How are you), or does not require searching for information (unless the greeting contains a follow-up question), return 'not_needed' in the 'question' XML block. This indicates that no search is required.
  2. If the user asks a question related to a specific URL, PDF, or webpage, include the links in the 'links' XML block and the question in the 'question' XML block. If the request is to summarize content from a URL or PDF, return 'summarize' in the 'question' XML block and include the relevant links in the 'links' XML block.
  3. For websearch, You need extract keywords into 'question' XML block. For knowledge, You need rewrite user query into 'rewrite' XML block with one alternative version while preserving the original intent and meaning.
  4. Websearch: Always return the rephrased question inside the 'question' XML block. If there are no links in the follow-up question, do not insert a 'links' XML block in your response.
  5. Knowledge: Always return the rephrased question inside the 'question' XML block.
  6. Always wrap the rephrased question in the appropriate XML blocks to specify the tool(s) for retrieving information: use <websearch></websearch> for queries requiring real-time or external information, <knowledge></knowledge> for queries that can be answered from a pre-existing knowledge base, or both if the question could be applicable to either tool. Ensure that the rephrased question is always contained within a <question></question> block inside these wrappers.

  There are several examples attached for your reference inside the below 'examples' XML block.

  <examples>
  1. Follow up question: What is the capital of France
  Rephrased question:\`
  <websearch>
    <question>
      Capital of France
    </question>
  </websearch>
  <knowledge>
    <rewrite>
      What city serves as the capital of France?
    </rewrite>
    <question>
      What is the capital of France
    </question>
  </knowledge>
  \`

  2. Follow up question: Hi, how are you?
  Rephrased question:\`
  <websearch>
    <question>
      not_needed
    </question>
  </websearch>
  <knowledge>
    <question>
      not_needed
    </question>
  </knowledge>
  \`

  3. Follow up question: What is Docker?
  Rephrased question: \`
  <websearch>
    <question>
      What is Docker
    </question>
  </websearch>
  <knowledge>
    <rewrite>
      Can you explain what Docker is and its main purpose?
    </rewrite>
    <question>
      What is Docker
    </question>
  </knowledge>
  \`

  4. Follow up question: Can you tell me what is X from https://example.com
  Rephrased question: \`
  <websearch>
    <question>
      What is X
    </question>
    <links>
      https://example.com
    </links>
  </websearch>
  <knowledge>
    <question>
      not_needed
    </question>
  </knowledge>
  \`

  5. Follow up question: Summarize the content from https://example1.com and https://example2.com
  Rephrased question: \`
  <websearch>
    <question>
      summarize
    </question>
    <links>
      https://example1.com
    </links>
    <links>
      https://example2.com
    </links>
  </websearch>
  <knowledge>
    <question>
      not_needed
    </question>
  </knowledge>
  \`

  6. Follow up question: Based on websearch, Which company had higher revenue in 2022, "Apple" or "Microsoft"?
  Rephrased question: \`
  <websearch>
    <question>
      Apple's revenue in 2022
    </question>
    <question>
      Microsoft's revenue in 2022
    </question>
  </websearch>
  <knowledge>
    <question>
      not_needed
    </question>
  </knowledge>
  \`

  7. Follow up question: Based on knowledge, Formula of Scaled Dot-Product Attention and Multi-Head Attention?
  Rephrased question: \`
  <websearch>
    <question>
      not_needed
    </question>
  </websearch>
  <knowledge>
    <rewrite>
      What are the mathematical formulas for Scaled Dot-Product Attention and Multi-Head Attention
    </rewrite>
    <question>
      What is the formula for Scaled Dot-Product Attention?
    </question>
    <question>
      What is the formula for Multi-Head Attention?
    </question>
  </knowledge>
  \`
  </examples>

  Anything below is part of the actual conversation. Use the conversation history and the follow-up question to rephrase the follow-up question as a standalone question based on the guidelines shared above.

  <conversation>
  {chat_history}
  </conversation>

  **Use user's language to rephrase the question.**
  Follow up question: {question}
  Rephrased question:
`

WebSearchTool.ts, KnowledgeSearchTool.ts 也类似

  let summary = 'No search needed based on the query analysis.'
  if (results.query && results.results.length > 0) {
    summary = `Found ${results.results.length} relevant sources. Use [number] format to cite specific information.`
  }

  const citationData = results.results.map((result, index) => ({
    number: index + 1,
    title: result.title,
    content: result.content,
    url: result.url
  }))

  // 🔑 返回引用友好的格式,复用 REFERENCE_PROMPT 逻辑
  const referenceContent = `\`\`\`json\n${JSON.stringify(citationData, null, 2)}\n\`\`\``
  const fullInstructions = REFERENCE_PROMPT.replace(
    '{question}',
    "Based on the search results, please answer the user's question with proper citations."
  ).replace('{references}', referenceContent)
  return {
    type: 'content',
    value: [
      {
        type: 'text',
        text: 'This tool searches for relevant information and formats results for easy citation. The returned sources should be cited using [1], [2], etc. format in your response.'
      },
      {
        type: 'text',
        text: summary
      },
      {
        type: 'text',
        text: fullInstructions
      }
    ]
  }
export const REFERENCE_PROMPT = `Please answer the question based on the reference materials

## Citation Rules:
- Please cite the context at the end of sentences when appropriate.
- Please use the format of citation number [number] to reference the context in corresponding parts of your answer.
- If a sentence comes from multiple contexts, please list all relevant citation numbers, e.g., [1][2]. Remember not to group citations at the end but list them in the corresponding parts of your answer.
- If all reference content is not relevant to the user's question, please answer based on your knowledge.

## My question is:

{question}

## Reference Materials:

{references}

Please respond in the same language as the user's question.
`

BaseApiClient.ts

  public async getMessageContent(
    message: Message
  ): Promise<{ textContent: string; imageContents: { fileId: string; fileExt: string }[] }> {
    const content = getMainTextContent(message)

    if (isEmpty(content)) {
      return {
        textContent: '',
        imageContents: []
      }
    }

    const webSearchReferences = await this.getWebSearchReferencesFromCache(message)
    const knowledgeReferences = await this.getKnowledgeBaseReferencesFromCache(message)
    const memoryReferences = this.getMemoryReferencesFromCache(message)

    const knowledgeTextReferences = knowledgeReferences.filter((k) => k.metadata?.type !== 'image')
    const knowledgeImageReferences = knowledgeReferences.filter((k) => k.metadata?.type === 'image')

    // 添加偏移量以避免ID冲突
    const reindexedKnowledgeReferences = knowledgeTextReferences.map((ref) => ({
      ...ref,
      id: ref.id + webSearchReferences.length // 为知识库引用的ID添加网络搜索引用的数量作为偏移量
    }))

    const allReferences = [...webSearchReferences, ...reindexedKnowledgeReferences, ...memoryReferences]

    const referenceContent = `\`\`\`json\n${JSON.stringify(allReferences, null, 2)}\n\`\`\``
    const imageReferences = knowledgeImageReferences.map((r) => {
      return { fileId: r.metadata?.id, fileExt: r.metadata?.ext }
    })

    return {
      textContent: isEmpty(allReferences)
        ? content
        : REFERENCE_PROMPT.replace('{question}', content).replace('{references}', referenceContent),
      imageContents: isEmpty(knowledgeImageReferences) ? [] : imageReferences
    }
  }

用 Pydantic 自动生成 LLM Tool Schema

作者 Shiina
2025年9月14日 08:00

简单小工具.

定义 tool 参数后, 不引入其他库, 仅用 Pydantic 自动生成符合 OpenAI 规范的 Tool Schema. 想法很简单, 把 Pydantic 的 model_json_schema 生成的 JSON Schema 处理成 OpenAI 规范即可.

好处是 (1) 不用引入或依赖其他乱七八糟的库; (2) 不用手动额外维护一套工具描述; (3) 能利用 Pydantic 的一些功能, 从 JSON string load 之后自动校验参数, 自动转换类型等.

基础示例

比如

class GetWeatherArgs(BaseModel):
    """Retrieves current weather for the given location."""
    location: str = Field(description="City and country e.g. Bogotá, Colombia")
    units: Literal["celsius", "fahrenheit"] = Field(description="Units the temperature will be returned in.")


def get_weather(args: GetWeatherArgs):
    """实际的工具处理逻辑"""
    pass


get_weather_tool = create_tool_from_pydantic(GetWeatherArgs)
print(json.dumps(get_weather_tool, ensure_ascii=False, indent=2))
{
  "type": "function",
  "function": {
    "name": "get_weather",
    "description": "Retrieves current weather for the given location.",
    "parameters": {
      "type": "object",
      "properties": {
        "location": {
          "type": "string",
          "description": "City and country e.g. Bogotá, Colombia"
        },
        "units": {
          "type": "string",
          "description": "Units the temperature will be returned in.",
          "enum": [
            "celsius",
            "fahrenheit"
          ]
        }
      },
      "required": [
        "location",
        "units"
      ]
    }
  }
}

完整代码

import datetime
import json
import re
import textwrap
from enum import StrEnum
from typing import Type, Literal, Optional, List, Any

import pydantic
from pydantic import BaseModel, Field


def _clean_text(text: str) -> str:
    """清理多行字符串的缩进和行尾空格。"""
    return textwrap.dedent(text).strip()


def _process_property(prop_schema: dict, defs: dict) -> dict:
    """递归地处理单个属性的 Schema,将其转换为 Tool 参数格式。"""
    # 1. 处理 Optional[T],在 Pydantic v2 中表现为 anyOf 包含 'null'
    if 'anyOf' in prop_schema:
        # 找到非 null 的那个 schema 定义
        non_null_schema = next((s for s in prop_schema['anyOf'] if s.get('type') != 'null'), None)
        if non_null_schema:
            # 递归处理,但保留外层的 description
            processed_schema = _process_property(non_null_schema, defs)
            if 'description' in prop_schema:
                processed_schema['description'] = _clean_text(prop_schema['description'])
            return processed_schema
        else:  # 理论上不应该只有 null
            return {}

    # 2. 处理嵌套对象 ($ref)
    if '$ref' in prop_schema:
        ref_name = prop_schema['$ref'].split('/')[-1]
        nested_schema = defs.get(ref_name)
        if nested_schema:
            # 对于嵌套对象,我们再次调用主转换函数
            return pydantic_to_tool_schema(nested_schema, defs)

    # 3. 处理基本类型和数组
    result = {}
    prop_type = prop_schema.get('type')

    if prop_type:
        result['type'] = prop_type
    if 'description' in prop_schema:
        result['description'] = _clean_text(prop_schema['description'])
    if 'enum' in prop_schema:
        result['enum'] = prop_schema['enum']

    # 3a. 处理数组 (List[T])
    if prop_type == 'array' and 'items' in prop_schema:
        # 递归处理数组元素的类型
        result['items'] = _process_property(prop_schema['items'], defs)

    return result


def pydantic_to_tool_schema(schema: dict, defs: dict = None) -> dict:
    """将 Pydantic 的 JSON Schema 转换为 Tool 的 parameters 部分。"""
    if defs is None:
        defs = schema.get('$defs', {})

    tool_params = {
        "type": "object",
        "properties": {},
        "required": schema.get("required", []),
    }

    # 顶层描述 (来自类的 docstring)
    if 'description' in schema:
        tool_params['description'] = _clean_text(schema['description'])

    properties = schema.get("properties", {})
    for name, prop_schema in properties.items():
        tool_params["properties"][name] = _process_property(prop_schema, defs)

    return tool_params


def create_tool_from_pydantic(pydantic_model: Type[BaseModel]) -> dict:
    """
    根据 Pydantic 模型自动创建一个符合 OpenAI 规范的 Tool 定义。

    - 自动从模型类名推断函数名 (例如 GetWeatherArgs -> get_weather)。
    - 自动使用模型的 docstring 作为工具的描述。
    """
    # 1. 从模型类名推断函数名
    model_name = pydantic_model.__name__
    class_name = model_name.removesuffix('Args')
    # 将驼峰命名 (CamelCase) 转换为下划线命名 (snake_case)
    function_name = re.sub(r'(?<!^)(?=[A-Z])', '_', class_name).lower()

    # 2. 生成 Pydantic Schema 并转换为 Tool Schema
    pydantic_schema = pydantic_model.model_json_schema()
    tool_schema = pydantic_to_tool_schema(pydantic_schema)

    description = tool_schema.pop("description", "")  # 描述移动到外层

    # 3. 构建并返回完整的 Tool 定义
    return {
        "type": "function",
        "function": {
            "name": function_name,
            "description": description,
            "parameters": tool_schema,
        },
    }


class GetWeatherArgs(BaseModel):
    """Retrieves current weather for the given location."""
    location: str = Field(description="City and country e.g. Bogotá, Colombia")
    units: Literal["celsius", "fahrenheit"] = Field(description="Units the temperature will be returned in.")


def get_weather(args: GetWeatherArgs):
    """实际的工具处理逻辑"""
    pass


get_weather_tool = create_tool_from_pydantic(GetWeatherArgs)
print(json.dumps(get_weather_tool, ensure_ascii=False, indent=2))

复杂点的例子

可以定义嵌套模型, 枚举类型, 添加自定义校验逻辑等. 下面的 SearchFilesArgs 模型演示了如何处理文件搜索场景, 它包含了对文件类型 (FileType 枚举) 和创建时间 (嵌套的 TimeRange 模型) 的筛选.

我们还定义了一个 LLMProofBaseModel 基类, 能自动处理来自 LLM 的 'null' 字符串输入. 嵌套的 TimeRange 模型中的校验器 check_dates 也展示了如何在数据模型层面封装业务规则.

# --- 接上一段代码 ---

class LLMProofBaseModel(BaseModel):
    """自动将所有字段中值为字符串 'null' 的输入转换为 None"""
    @pydantic.field_validator('*', mode='before')
    @classmethod
    def _clean_null_str(cls, v: Any) -> Any:
        if isinstance(v, str) and v.lower() == 'null':
            return None
        return v


class TimeRange(LLMProofBaseModel):
    """这个 docstring 不会用到"""
    start_date: Optional[datetime.date] = Field(None, description="开始日期 (YYYY-MM-DD)")
    end_date: Optional[datetime.date] = Field(None, description="结束日期 (YYYY-MM-DD)")
    random_field: Optional[str] = Field(None, description='演示用')

    @pydantic.model_validator(mode='after')
    def check_dates(self) -> 'TimeRange':
        if self.start_date and self.end_date and self.start_date > self.end_date:
            # 抛出错误或者其他处理方式
            self.end_date = self.start_date
        return self


class FileType(StrEnum):
    PDF = "pdf"
    PPT = "ppt"


class SearchFilesArgs(LLMProofBaseModel):
    """
    搜索文件

    多行示例
    - xx
    - yy
    """
    query: str = Field(description="根据用户问题提炼出的核心搜索查询语句")
    file_types: Optional[List[Literal[*FileType]]] = Field(None, description="文件类型")
    time_range: Optional[TimeRange] = Field(None, description="文件创建时间范围")


search_file_tool = create_tool_from_pydantic(SearchFilesArgs)

tools = [
    get_weather_tool,
    search_file_tool,
]
print(json.dumps(tools, ensure_ascii=False, indent=2))

args1 = GetWeatherArgs.model_validate({"location": "Bogotá, Colombia", "units": "celsius"})
args2 = SearchFilesArgs.model_validate(
    {
        "query": "年报", "file_types": ["pdf"],
        "time_range": {"start_date": "2025-01-01", "end_date": "2024-01-01", "random_field": "null"},
    }
)
[
  {
    "type": "function",
    "function": {
      "name": "get_weather",
      "description": "Retrieves current weather for the given location.",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "City and country e.g. Bogotá, Colombia"
          },
          "units": {
            "type": "string",
            "description": "Units the temperature will be returned in.",
            "enum": [
              "celsius",
              "fahrenheit"
            ]
          }
        },
        "required": [
          "location",
          "units"
        ]
      }
    }
  },
  {
    "type": "function",
    "function": {
      "name": "search_files",
      "description": "搜索文件\n\n多行示例\n- xx\n- yy",
      "parameters": {
        "type": "object",
        "properties": {
          "query": {
            "type": "string",
            "description": "根据用户问题提炼出的核心搜索查询语句"
          },
          "file_types": {
            "type": "array",
            "items": {
              "type": "string",
              "enum": [
                "pdf",
                "ppt"
              ]
            },
            "description": "文件类型"
          },
          "time_range": {
            "type": "object",
            "properties": {
              "start_date": {
                "type": "string",
                "description": "开始日期 (YYYY-MM-DD)"
              },
              "end_date": {
                "type": "string",
                "description": "结束日期 (YYYY-MM-DD)"
              },
              "random_field": {
                "type": "string",
                "description": "演示用"
              }
            },
            "required": [],
            "description": "文件创建时间范围"
          }
        },
        "required": [
          "query"
        ]
      }
    }
  }
]

LightRAG 源码简要分享

作者 Shiina
2025年1月21日 08:00

Guo, Z., Xia, L., Yu, Y., Ao, T., & Huang, C. (2024). Lightrag: Simple and fast retrieval-augmented generation.

大体流程:

  • 用 LLM 提取 chunks 中的实体和关系, 并存成一个图
  • 用 LLM 从 query 中提取关键词, 根据关键词召回实体或关系, 再找到最相关的 chunks, 最后把所有东西都拼起来给 LLM 输出答案

提取实体和关系并存为图

提示词位于 lightrag/prompt.py. 文档分片后, 让 LLM 按照特定格式提取实体和关系 (以及关键词), 再把 output 解析出来存储. 看代码下面的 step 3 content_keywords 好像全程都没用到.

1. Identify all entities. 
...
Format each entity as ("entity"{tuple_delimiter}<entity_name>{tuple_delimiter}<entity_type>{tuple_delimiter}<entity_description>)

2. From the entities identified in step 1, identify all pairs of (source_entity, target_entity) that are *clearly related* to each other.
...
- relationship_strength: a numeric score indicating strength of the relationship between the source entity and target entity
- relationship_keywords: one or more high-level key words that summarize the overarching nature of the relationship, focusing on concepts or themes rather than specific details
Format each relationship as ("relationship"{tuple_delimiter}<source_entity>{tuple_delimiter}<target_entity>{tuple_delimiter}<relationship_description>{tuple_delimiter}<relationship_keywords>{tuple_delimiter}<relationship_strength>)

3. Identify high-level key words that summarize the main concepts, themes, or topics of the entire text. These should capture the overarching ideas present in the document.
Format the content-level key words as ("content_keywords"{tuple_delimiter}<high_level_keywords>)

...

5. When finished, output {completion_delimiter}
Example 1:

Entity_types: [person, technology, mission, organization, location]
Text:
...
################
Output:
("entity"{tuple_delimiter}"Taylor"{tuple_delimiter}"person"{tuple_delimiter}"Taylor is portrayed with authoritarian certainty and shows a moment of reverence towards a device, indicating a change in perspective."){record_delimiter}
("entity"{tuple_delimiter}"Jordan"{tuple_delimiter}"person"{tuple_delimiter}"Jordan shares a commitment to discovery and has a significant interaction with Taylor regarding a device."){record_delimiter}
...
("relationship"{tuple_delimiter}"Taylor"{tuple_delimiter}"Jordan"{tuple_delimiter}"Taylor and Jordan interact directly regarding the device, leading to a moment of mutual respect and an uneasy truce."{tuple_delimiter}"conflict resolution, mutual respect"{tuple_delimiter}8){record_delimiter}
...
("content_keywords"{tuple_delimiter}"power dynamics, ideological conflict, discovery, rebellion"){completion_delimiter}

实体存储如下, 后面会把同名的实体合并 (description 也合并, 过长就 LLM 摘要), source_id 是所有来源的 chunk id. 根据 dp["entity_name"] + dp["description"] 做 embedding.

dict(
    entity_name=entity_name,
    entity_type=entity_type,
    description=entity_description,
    source_id=entity_source_id,
)

关系存储如下, 其中 edge_keywords 和 weight 分别为之前 LLM 生成的 relationship_keywordsrelationship_strength. 之后合并时 weight 会相加. 根据 dp["keywords"] + dp["src_id"] + dp["tgt_id"] + dp["description"] 做 embedding.

dict(
    src_id=source,
    tgt_id=target,
    weight=weight,
    description=edge_description,
    keywords=edge_keywords,
    source_id=edge_source_id,
    metadata={"created_at": time.time()},
)

把实体和关系分别作为节点和边, 存为图.

召回

用 LLM 从用户 query 中提取 high-level 与 low-level 关键词.

Given the query, list both high-level and low-level keywords. High-level keywords focus on overarching concepts or themes, while low-level keywords focus on specific entities, details, or concrete terms.
Example 1:

Query: "How does international trade influence global economic stability?"
################
Output:
{
  "high_level_keywords": ["International trade", "Global economic stability", "Economic impact"],
  "low_level_keywords": ["Trade agreements", "Tariffs", "Currency exchange", "Imports", "Exports"]
}
if query_param.mode == "local":
    entities_context, relations_context, text_units_context = await _get_node_data(
        ll_keywords,
        knowledge_graph_inst,
        entities_vdb,
        text_chunks_db,
        query_param,
    )
elif query_param.mode == "global":
    entities_context, relations_context, text_units_context = await _get_edge_data(
        hl_keywords,
        knowledge_graph_inst,
        relationships_vdb,
        text_chunks_db,
        query_param,
    )

先看所谓的 local 召回.

  1. 把 low-level 关键词拼接成字符串 (是的, 虽然抽出来格式是列表, 但没什么意义), 比如 Trade agreements, Tariffs, Currency exchange, Imports, Exports, 代码中记为 query (lightrag/operate.py_get_node_data 函数).
  2. 根据 query, 从实体向量库中召回 top k 实体.
  3. _find_most_related_text_unit_from_entities. 找出之前召回实体的所有边 (关系), 把所有 chunks 按照这些边的数量 (relation_counts) 从大到小排序, 根据限制 token 数截取前若干个 chunks.
  4. _find_most_related_edges_from_entities. 找出之前召回实体的所有边 (关系), 根据 tuple(实体节点的 degree 代码记为 rank, 边的 weight 即 relationship_strength) 从大到小排序, 根据限制 token 数截取前若干个关系的 description.
  5. 最后将实体, 关系, 以及 chunks 信息以 csv 的格式拼接起来给 LLM 推理.

entities_context

id,entity,type,description,rank
0,"""A CHRISTMAS CAROL""","""EVENT""","""A Christmas Carol is a literary event, being a classic story written by Charles Dickens and published in various editions.""",12

relations_context

id,source,target,description,keywords,weight,rank,created_at
0,"""A CHRISTMAS CAROL""","""CHARLES DICKENS""","""Charles Dickens is the author of 'A Christmas Carol,' making him the creator of this literary work.""","""authorship, literary creation""",10.0,13,UNKNOWN

text_units_context (chunks)

id,content
0,"The Project Gutenberg eBook of A Christmas Carol..."

再看 global 召回, 和之前类似, 从略.

  1. 用 high-level 关键词拼成字符串.
  2. 召回边 (关系).
  3. _find_related_text_unit_from_relationships
  4. _find_most_related_entities_from_relationships
  5. 最终拼接

读 Claude Code 源码 - Web Search & Web Fetch

作者 Shiina
2026年4月5日 08:00

WebSearch 调用服务端的搜索工具, WebFetch 本地抓 URL、HTML 转 markdown、再交给一个小模型按 prompt 提炼.

WebSearch

Schema

const inputSchema = lazySchema(() =>
  z.strictObject({
    query: z.string().min(2).describe('The search query to use'),
    allowed_domains: z
      .array(z.string())
      .optional()
      .describe('Only include search results from these domains'),
    blocked_domains: z
      .array(z.string())
      .optional()
      .describe('Never include search results from these domains'),
  }),
)

禁止同时传 allowed_domainsblocked_domains.

'Error: Cannot specify both allowed_domains and blocked_domains in the same request'

The max_uses parameter limits the number of searches performed. If Claude attempts more searches than allowed, the web_search_tool_result is an error with the max_uses_exceeded error code.

function makeToolSchema(input: Input): BetaWebSearchTool20250305 {
  return {
    type: 'web_search_20250305',
    name: 'web_search',
    allowed_domains: input.allowed_domains,
    blocked_domains: input.blocked_domains,
    max_uses: 8, // Hardcoded to 8 searches maximum
  }
}

从结果解析那段注释来看, 设计上就是允许同一次 WebSearch 里出现多轮 search, 限制 max_uses 轮次.

  // The result is a sequence of these blocks:
  // - text to start -- always?
  // [
  //    - server_tool_use
  //    - web_search_tool_result
  //    - text and citation blocks intermingled
  //  ]+  (this block repeated for each search)

Prompt

- Allows Claude to search the web and use the results to inform responses
- Provides up-to-date information for current events and recent data
- Returns search result information formatted as search result blocks, including links as markdown hyperlinks
- Use this tool for accessing information beyond Claude's knowledge cutoff
- Searches are performed automatically within a single API call

回答之后的引用格式.

CRITICAL REQUIREMENT - You MUST follow this:
  - After answering the user's question, you MUST include a "Sources:" section at the end of your response
  - In the Sources section, list all relevant URLs from the search results as markdown hyperlinks: [Title](URL)
  - This is MANDATORY - never skip including sources in your response
  - Example format:

    [Your answer here]

    Sources:
    - [Source Title 1](https://example.com/1)
    - [Source Title 2](https://example.com/2)
Usage notes:
  - Domain filtering is supported to include or block specific websites
  - Web search is only available in the US

时间约束不是通过类似 time filter / recency 这样的参数, 而是 prompt 引导在搜索 query 中加上年份 (比如 “XXX 文档 2026”).

IMPORTANT - Use the correct year in search queries:
  - The current month is ${currentMonthYear}. You MUST use this year when searching for recent information, documentation, or current events.
  - Example: If the user asks for "latest React docs", search for "React documentation" with the current year, NOT last year

拿到链接拼回 tool result 后再次提醒.

'\nREMINDER: You MUST include the sources above in your response to the user using markdown hyperlinks.'

WebFetch

const inputSchema = lazySchema(() =>
  z.strictObject({
    url: z.string().url().describe('The URL to fetch content from'),
    prompt: z.string().describe('The prompt to run on the fetched content'),
  }),
)
const outputSchema = lazySchema(() =>
  z.object({
    bytes: z.number().describe('Size of the fetched content in bytes'),
    code: z.number().describe('HTTP response code'),
    codeText: z.string().describe('HTTP response code text'),
    result: z
      .string()
      .describe('Processed result from applying the prompt to the content'),
    durationMs: z
      .number()
      .describe('Time taken to fetch and process the content'),
    url: z.string().describe('The URL that was fetched'),
  }),
)

抓网页, 转 markdown, 传 prompt 指导小模型提取信息.

优先用 MCP (比如智谱 coding plan 就是提供的自家的 web fetch MCP), 对 Github 优先用 gh CLI.

IMPORTANT: WebFetch WILL FAIL for authenticated or private URLs. Before using this tool, check if the URL points to an authenticated service (e.g. Google Docs, Confluence, Jira, GitHub). If so, look for a specialized MCP tool that provides authenticated access.

- Fetches content from a specified URL and processes it using an AI model
- Takes a URL and a prompt as input
- Fetches the URL content, converts HTML to markdown
- Processes the content with the prompt using a small, fast model
- Returns the model's response about the content
- Use this tool when you need to retrieve and analyze web content

Usage notes:
  - IMPORTANT: If an MCP-provided web fetch tool is available, prefer using that tool instead of this one, as it may have fewer restrictions.
  - The URL must be a fully-formed valid URL
  - HTTP URLs will be automatically upgraded to HTTPS
  - The prompt should describe what information you want to extract from the page
  - This tool is read-only and does not modify any files
  - Results may be summarized if the content is very large
  - Includes a self-cleaning 15-minute cache for faster responses when repeatedly accessing the same URL
  - When a URL redirects to a different host, the tool will inform you and provide the redirect URL in a special format. You should then make a new WebFetch request with the redirect URL to fetch the content.
  - For GitHub URLs, prefer using the gh CLI via Bash instead (e.g., gh pr view, gh issue view, gh api).

权限和安全策略

有一层“预批准域名”src\tools\WebFetchTool\preapproved.ts:5 是源码里内置的一份 host 白名单, 命中之后会直接 allow.

// For legal and security concerns, we typically only allow Web Fetch to access
// domains that the user has provided in some form. However, we make an
// exception for a list of preapproved domains that are code-related.
//
// SECURITY WARNING: These preapproved domains are ONLY for WebFetch (GET requests only).
// The sandbox system deliberately does NOT inherit this list for network restrictions,
// as arbitrary network access (POST, uploads, etc.) to these domains could enable
// data exfiltration. Some domains like huggingface.co, kaggle.com, and nuget.org
// allow file uploads and would be dangerous for unrestricted network access.
//
// See test/utils/sandbox/webfetch-preapproved-separation.test.ts for verification
// that sandbox network restrictions require explicit user permission rules.

export const PREAPPROVED_HOSTS = new Set([
  // Anthropic
  'platform.claude.com',
  'code.claude.com',
  'modelcontextprotocol.io',
  'github.com/anthropics',
  'agentskills.io',

  // Top Programming Languages
  'docs.python.org', // Python
  'en.cppreference.com', // C/C++ reference
  
  ...
  
  // Other Essential Tools
  'git-scm.com', // Git
  'nginx.org', // Nginx
  'httpd.apache.org', // Apache HTTP Server
])

拉取链路

缓存

// Cache with 15-minute TTL and 50MB size limit
// LRUCache handles automatic expiration and eviction

域名预检缓存

// Separate cache for preflight domain checks. URL_CACHE is URL-keyed, so
// fetching two paths on the same domain triggers two identical preflight
// HTTP round-trips to api.anthropic.com. This hostname-keyed cache avoids
// that. Only 'allowed' is cached — blocked/failed re-check on next attempt.
  • 页面内容缓存按 URL
  • 安全预检缓存按 hostname
  • 而且只缓存 allowed, 不缓存失败态, 因为 blocked / failed 可能是暂时性的, 下次重试未必还是一样.

资源限制:

// "Implement resource consumption controls because setting limits on CPU,
// memory, and network usage for the Web Fetch tool can prevent a single
// request or user from overwhelming the system."
const MAX_HTTP_CONTENT_LENGTH = 10 * 1024 * 1024

const FETCH_TIMEOUT_MS = 60_000
const DOMAIN_CHECK_TIMEOUT_MS = 10_000
const MAX_REDIRECTS = 10
export const MAX_MARKDOWN_LENGTH = 100_000

自己接管 redirect.

 * "Do not automatically follow redirects because following redirects could
 * allow for an attacker to exploit an open redirect vulnerability in a
 * trusted domain to force a user to make a request to a malicious domain
 * unknowingly"

只允许很有限的跳转:

  • 同 origin 改 path / query
  • www. 的增减
  • 其他跨 host redirect 不自动跟

如果真跳到别的 host, WebFetchTool 不会偷偷跟过去, 而是返回一个特殊结果, 明确告诉模型“请用新 URL 和 prompt 再调一次”.

    // Check if we got a redirect to a different host
    if ('type' in response && response.type === 'redirect') {
      const statusText =
        response.statusCode === 301
          ? 'Moved Permanently'
          : response.statusCode === 308
            ? 'Permanent Redirect'
            : response.statusCode === 307
              ? 'Temporary Redirect'
              : 'Found'

      const message = `REDIRECT DETECTED: The URL redirects to a different host.

Original URL: ${response.originalUrl}
Redirect URL: ${response.redirectUrl}
Status: ${response.statusCode} ${statusText}

To complete your request, I need to fetch content from the redirected URL. Please use WebFetch again with these parameters:
- url: "${response.redirectUrl}"
- prompt: "${prompt}"`

内容处理

抓到内容后, 区分 HTML 和非 HTML. 用现成的 turndown 库把 HTML 转成 markdown

// Lazy singleton — defers the turndown → @mixmark-io/domino import (~1.4MB
// retained heap) until the first HTML fetch, and reuses one instance across
// calls
if (contentType.includes('text/html')) {
  markdownContent = (await getTurndownService()).turndown(htmlContent)
}

二进制内容也没简单丢掉

// Binary content: save raw bytes to disk with a proper extension so Claude
// can inspect the file later. We still fall through to the utf-8 decode +
// Haiku path below — for PDFs in particular the decoded string has enough
// ASCII structure (/Title, text streams) that Haiku can summarize it

加速

WebFetch 加速点主要有五个.

第一层是 URL 缓存和域名预检缓存, 上面已经说了.

第二层是“可信内容直出”:

  • 如果 URL 属于 preapproved domain
  • content-typetext/markdown
  • 且长度小于 MAX_MARKDOWN_LENGTH

那么它直接把 markdown 原文返回, 不再过二级模型. 也就是说, 对官方文档站这类最常见场景, 它直接省掉一次 Haiku 调用.

第三层是内容截断:

// Truncate content to avoid "Prompt is too long" errors from the secondary model

这虽然看起来只是防报错, 但本质上也是 latency 控制: 不让二级模型吃超长网页.

第四层是二级模型本身就选了快模型 (haiku).

- Processes the content with the prompt using a small, fast model

第五层是 prompt 也按站点信任级别分流. 对预批准文档站, 指令比较宽松, 对普通网站则更保守. 这里的 prompt 参数是调 web fetch 工具时传的 prompt 参数.

export function makeSecondaryModelPrompt(
  markdownContent: string,
  prompt: string,
  isPreapprovedDomain: boolean,
): string {
  const guidelines = isPreapprovedDomain
    ? `Provide a concise response based on the content above. Include relevant details, code examples, and documentation excerpts as needed.`
    : `Provide a concise response based only on the content above. In your response:
 - Enforce a strict 125-character maximum for quotes from any source document. Open Source Software is ok as long as we respect the license.
 - Use quotation marks for exact language from articles; any language outside of the quotation should never be word-for-word the same.
 - You are not a lawyer and never comment on the legality of your own prompts and responses.
 - Never produce or reproduce exact song lyrics.`

  return `
Web page content:
---
${markdownContent}
---

${prompt}

${guidelines}
`
}

读 Claude Code 源码 - memory 机制

作者 Shiina
2026年4月3日 08:00

分为 Session MemoryAuto Memory (跨 session).

Session Memory

每个 session 维护一个 summary.md 文件, 后台用 forked subagent (并且限制权限只能编辑这个 summary.md 文件) 达到条件 (距离上次更新后新增 token 数以及 tool calls 数等) 后更新.

/**
 * Session Memory automatically maintains a markdown file with notes about the current conversation.
 * It runs periodically in the background using a forked subagent to extract key information
 * without interrupting the main conversation flow.
 */
/**
 * Configuration for session memory extraction thresholds
 */
export type SessionMemoryConfig = {
  /** Minimum context window tokens before initializing session memory.
   * Uses the same token counting as autocompact (input + output + cache tokens)
   * to ensure consistent behavior between the two features. */
  minimumMessageTokensToInit: number
  /** Minimum context window growth (in tokens) between session memory updates.
   * Uses the same token counting as autocompact (tokenCountWithEstimation)
   * to measure actual context growth, not cumulative API usage. */
  minimumTokensBetweenUpdate: number
  /** Number of tool calls between session memory updates */
  toolCallsBetweenUpdates: number
}
export const DEFAULT_SESSION_MEMORY_CONFIG: SessionMemoryConfig = {
  minimumMessageTokensToInit: 10000,
  minimumTokensBetweenUpdate: 5000,
  toolCallsBetweenUpdates: 3,
}
  // Trigger extraction when:
  // 1. Both thresholds are met (tokens AND tool calls), OR
  // 2. No tool calls in last turn AND token threshold is met
  //    (to ensure we extract at natural conversation breaks)
  //
  // IMPORTANT: The token threshold (minimumTokensBetweenUpdate) is ALWAYS required.
  // Even if the tool call threshold is met, extraction won't happen until the
  // token threshold is also satisfied. This prevents excessive extractions.
/**
 * Returns the session memory directory path for the current session with trailing separator.
 * Path format: {projectDir}/{sessionId}/session-memory/
 */
export function getSessionMemoryDir(): string {
  return join(getProjectDir(getCwd()), getSessionId(), 'session-memory') + sep
}

/**
 * Returns the session memory file path for the current session.
 * Path format: {projectDir}/{sessionId}/session-memory/summary.md
 */
export function getSessionMemoryPath(): string {
  return join(getSessionMemoryDir(), 'summary.md')
}

Prompt 模板:

# Session Title
_A short and distinctive 5-10 word descriptive title for the session. Super info dense, no filler_

# Current State
_What is actively being worked on right now? Pending tasks not yet completed. Immediate next steps._

# Task specification
_What did the user ask to build? Any design decisions or other explanatory context_

后面还有:

  • Files and Functions
  • Workflow
  • Errors & Corrections
  • Codebase and System Documentation
  • Learnings
  • Key results
  • Worklog
- The file must maintain its exact structure with all sections, headers, and italic descriptions intact
-- NEVER modify, delete, or add section headers
-- NEVER modify or delete the italic _section description_ lines
-- ONLY update the actual content that appears BELOW the italic _section descriptions_

更新动作由一个 forked subagent 执行.

  // Run session memory extraction using runForkedAgent for prompt caching
  // runForkedAgent creates an isolated context to prevent mutation of parent state
  // Pass setupContext.readFileState so the forked agent can edit the memory file
/**
 * Creates a canUseTool function that only allows Edit for the exact memory file.
 */
if (
  tool.name === FILE_EDIT_TOOL_NAME &&
  typeof input === 'object' &&
  input !== null &&
  'file_path' in input
) {
  const filePath = input.file_path
  if (typeof filePath === 'string' && filePath === memoryPath) {
    return { behavior: 'allow' as const, updatedInput: input }
  }
}

用途: 给 autocompact 提前备料

一旦满足 autocompact 条件, 先试 session memory compaction.

  // EXPERIMENT: Try session memory compaction first
  const sessionMemoryResult = await trySessionMemoryCompaction(
    messages,
    toolUseContext.agentId,
    recompactionInfo.autoCompactThreshold,
  )
/**
 * Try to use session memory for compaction instead of traditional compaction.
 * Returns null if session memory compaction cannot be used.
 *
 * Handles two scenarios:
 * 1. Normal case: lastSummarizedMessageId is set, keep only messages after that ID
 * 2. Resumed session: lastSummarizedMessageId is not set but session memory has content,
 *    keep all messages but use session memory as the summary
 */

保留最近一段原始消息

/**
 * Calculate the starting index for messages to keep after compaction.
 * Starts from lastSummarizedMessageId, then expands backwards to meet minimums:
 * - At least config.minTokens tokens
 * - At least config.minTextBlockMessages messages with text blocks
 * Stops expanding if config.maxTokens is reached.
 * Also ensures tool_use/tool_result pairs are not split.
 */
export const DEFAULT_SM_COMPACT_CONFIG: SessionMemoryCompactConfig = {
  minTokens: 10_000,
  minTextBlockMessages: 5,
  maxTokens: 40_000,
}

避免切断 tool use 和 tool result 配对.

 * Adjust the start index to ensure we don't split tool_use/tool_result pairs
 * or thinking blocks that share the same message.id with kept assistant messages.
 *
 * If ANY message we're keeping contains tool_result blocks, we need to
 * include the preceding assistant message(s) that contain the matching tool_use blocks.
 * API error: orphan tool_result references non-existent tool_use

预算控制, 避免 summary.md 本身太大.

const MAX_SECTION_LENGTH = 2000
const MAX_TOTAL_SESSION_MEMORY_TOKENS = 12000

更新 prompt 里明确提醒

- Keep each section under ~${MAX_SECTION_LENGTH} tokens/words
- IMPORTANT: Always update "Current State" to reflect the most recent work

如果总量已经超了, 追加压缩提醒

CRITICAL: The session memory file is currently ~${totalTokens} tokens, which exceeds the maximum of ${MAX_TOTAL_SESSION_MEMORY_TOKENS} tokens. You MUST condense the file to fit within this budget.

进入 compact 流程时, 也会再做一次截断保护

/**
 * Truncate session memory sections that exceed the per-section token limit.
 * Used when inserting session memory into compact messages to prevent
 * oversized session memory from consuming the entire post-compact token budget.
 */

另一个用途: 给 skillify 提供会话背景

把当前 session 炼成 skill.

## Your Session Context

Here is the session memory summary:
<session_memory>

</session_memory>
const sessionMemory =
  (await getSessionMemoryContent()) ?? 'No session memory available.'

然后它再把本次 session 的 user messages 补进去

const userMessages = extractUserMessages(
  getMessagesAfterCompactBoundary(context.messages),
)

Auto Memory

需要打开 feature, 见 文档. 跨 session, 支持召回.

/**
 * Whether auto-memory features are enabled (memdir, agent memory, past session search).
 */

MEMORY.md 是入口索引

const AUTO_MEM_DIRNAME = 'memory'
const AUTO_MEM_ENTRYPOINT_NAME = 'MEMORY.md'

默认目录解析

/**
 * Returns the auto-memory directory path.
 *
 * Resolution order:
 *   1. CLAUDE_COWORK_MEMORY_PATH_OVERRIDE env var
 *   2. autoMemoryDirectory in settings.json
 *   3. <memoryBase>/projects/<sanitized-git-root>/memory/
 */

这里的 MEMORY.md 负责做入口索引, durable memory 写进各自的 topic file.

`MEMORY.md` is an index, not a memory
each entry should be one line, under ~150 characters
Saving a memory is a two-step process:

**Step 1**  write the memory to its own file
**Step 2**  add a pointer to that file in `MEMORY.md`
/**
 * Extracts durable memories from the current session transcript
 * and writes them to the auto-memory directory (~/.claude/projects/<path>/memory/).
 *
 * It runs once at the end of each complete query loop
 */

MEMORY.md 会被自动注入上下文

" (user's auto-memory, persists across conversations)"

支持按 query 检索 (通过读 topic files 的 frontmatter) 相关 memories

/**
 * Find memory files relevant to a query by scanning memory file headers
 * and asking Sonnet to select the most relevant ones.
 *
 * Returns absolute file paths + mtime of the most relevant memories
 * (up to 5). Excludes MEMORY.md (already loaded in system prompt).
 */

读 Claude Code 源码 - 上下文压缩策略

作者 Shiina
2026年4月1日 08:00

若干层压缩.

压缩 tool result

在 message 中, assistant message 包含模型回答以及 tool_use 等, 而 user message 包含用户输入以及 tool_result 等. 如果当前 user message group (把 assistant message 作为边界, 两个 assistant message 中间所有的 user messages 全部合并起来算) 中可压缩的 tool result 总字符数超过 MAX_TOOL_RESULTS_PER_MESSAGE_CHARS, 就迭代地把最长的 tool_result 替换成 preview (包含原始 tool result 的本地持久化路径, 已经开头若干字符的预览) 直到总长度小于上限.

对 tool use id, 内部维护变量 seenIds (set, 表示已经见过的 tool use id) 和 replacements (dict, 记录被压缩的 tool use id 到 preview 的映射). 之后发送 llm api 调用前, 对于已经处理过的 tool use id 都会按照之前同样的处理 (压缩过的就重新替换成一样的 preview, 没压缩过的依旧不压缩) 避免破坏 prompt cache.

 * Enforce the per-message budget on aggregate tool result size.
 *
 * For each user message whose tool_result blocks together exceed the
 * per-message limit (see getPerMessageBudgetLimit), the largest FRESH
 * (never-before-seen) results in THAT message are persisted to disk and
 * replaced with previews.
 * Messages are evaluated independently  a 150K result in one message and
 * a 150K result in another are both under budget and untouched.
 *
 * State is tracked by tool_use_id in `state`. Once a result is seen its
 * fate is frozen: previously-replaced results get the same replacement
 * re-applied every turn from the cached preview string (zero I/O,
 * byte-identical), and previously-unreplaced results are never replaced
 * later (would break prompt cache).
 *
 * Each turn adds at most one new user message with tool_result blocks,
 * so the per-message loop typically does the budget check at most once;
 * all prior messages just re-apply cached replacements.
/**
 * Default maximum aggregate size in characters for tool_result blocks within
 * a SINGLE user message (one turn's batch of parallel tool results). When a
 * message's blocks together exceed this, the largest blocks in that message
 * are persisted to disk and replaced with previews until under budget.
 * Messages are evaluated independently — a 150K result in one turn and a
 * 150K result in the next are both untouched.
 *
 * This prevents N parallel tools from each hitting the per-tool max and
 * collectively producing e.g. 10 × 40K = 400K in one turn's user message.
 */
export const MAX_TOOL_RESULTS_PER_MESSAGE_CHARS = 200_000
/**
 * Build a message for large tool results with preview
 */
export function buildLargeToolResultMessage(
  result: PersistedToolResult,
): string {
  let message = `${PERSISTED_OUTPUT_TAG}\n`
  message += `Output too large (${formatFileSize(result.originalSize)}). Full output saved to: ${result.filepath}\n\n`
  message += `Preview (first ${formatFileSize(PREVIEW_SIZE_BYTES)}):\n`
  message += result.preview
  message += result.hasMore ? '\n...\n' : '\n'
  message += PERSISTED_OUTPUT_CLOSING_TAG
  return message
}
/**
 * Extract candidate tool_result blocks grouped by API-level user message.
 *
 * normalizeMessagesForAPI merges consecutive user messages into one
 * (Bedrock compat; 1P does the same server-side), so parallel tool
 * results that arrive as N separate user messages in our state become
 * ONE user message on the wire. The budget must group the same way or
 * it would see N under-budget messages instead of one over-budget
 * message and fail to enforce exactly when it matters most.
 *
 * A "group" is a maximal run of user messages NOT separated by an
 * assistant message. Only assistant messages create wire-level
 * boundaries — normalizeMessagesForAPI filters out progress entirely
 * and merges attachment / system(local_command) INTO adjacent user
 * blocks, so those types do NOT break groups here either.
 *
 * This matters for abort-during-parallel-tools paths: agent_progress
 * messages (non-ephemeral, persisted in REPL state) can interleave
 * between fresh tool_result messages. If we flushed on progress, those
 * tool_results would split into under-budget groups, slip through
 * unreplaced, get frozen, then be merged by normalizeMessagesForAPI
 * into one over-budget wire message — defeating the feature.
 *
 * Only groups with at least one eligible candidate are returned.
 */
function collectCandidatesByMessage(
  messages: Message[],
): ToolResultCandidate[][]
...
/**
 * Extract candidate tool_result blocks from a single user message: blocks
 * that are non-empty, non-image, and not already compacted by tag (i.e. by
 * the per-tool limit, or an earlier iteration of this same query call).
 * Returns [] for messages with no eligible blocks.
 */
function collectCandidatesFromMessage(message: Message): ToolResultCandidate[] ...

后面有 snip compact 环节, 但因为代码缺失从略.

Micro-compact

  • 如果发送请求时距离上一个请求过了太久, 服务端的 prompt cache 早就失效了, 干脆把旧的 tool results 清理掉 (替换成 TIME_BASED_MC_CLEARED_MESSAGE) 再发请求 (保留最近 keepRecent 个, 其他压缩).
  • 如果 prompt cache 还在, 就用 Anthropic 特有的 context editing 接口压缩 tool results.
  // Time-based trigger runs first and short-circuits. If the gap since the
  // last assistant message exceeds the threshold, the server cache has expired
  // and the full prefix will be rewritten regardless — so content-clear old
  // tool results now, before the request, to shrink what gets rewritten.
  // Cached MC (cache-editing) is skipped when this fires: editing assumes a
  // warm cache, and we just established it's cold.
  const compactableIds = collectCompactableToolIds(messages)

  // Floor at 1: slice(-0) returns the full array (paradoxically keeps
  // everything), and clearing ALL results leaves the model with zero working
  // context. Neither degenerate is sensible — always keep at least the last.
  const keepRecent = Math.max(1, config.keepRecent)
  const keepSet = new Set(compactableIds.slice(-keepRecent))
  const clearSet = new Set(compactableIds.filter(id => !keepSet.has(id)))
// Inline from utils/toolResultStorage.ts — importing that file pulls in
// sessionStorage → utils/messages → services/api/errors, completing a
// circular-deps loop back through this file via promptCacheBreakDetection.
// Drift is caught by a test asserting equality with the source-of-truth.
export const TIME_BASED_MC_CLEARED_MESSAGE = '[Old tool result content cleared]'

// Only compact these tools
const COMPACTABLE_TOOLS = new Set<string>([
  FILE_READ_TOOL_NAME,
  ...SHELL_TOOL_NAMES,
  GREP_TOOL_NAME,
  GLOB_TOOL_NAME,
  WEB_SEARCH_TOOL_NAME,
  WEB_FETCH_TOOL_NAME,
  FILE_EDIT_TOOL_NAME,
  FILE_WRITE_TOOL_NAME,
])
/**
 * Cached microcompact path - uses cache editing API to remove tool results
 * without invalidating the cached prefix.
 *
 * Key differences from regular microcompact:
 * - Does NOT modify local message content (cache_reference and cache_edits are added at API layer)
 * - Uses count-based trigger/keep thresholds from GrowthBook config
 * - Takes precedence over regular microcompact (no disk persistence)
 * - Tracks tool results and queues cache edits for the API layer
 */

之后有 context collapse 环节, 因为代码缺失从略.

Auto-compact

如果上下文大于阈值, 会先 trySessionMemoryCompaction (TODO: 还没看), 如果失败或不适用再构造 prompt 专门调一次 llm 总结 (会先尝试 fork agent 以吃到现有的 prompt cache). 连续失败 3 次则不再同个 session 尝试 autocompact.

  // When prompt cache sharing is enabled, use forked agent to reuse the
  // main conversation's cached prefix (system prompt, tools, context messages).
  // Falls back to regular streaming path on failure.
export async function runForkedAgent(...):{
...
  const initialMessages: Message[] = [...forkContextMessages, ...promptMessages]
...
// Reserve this many tokens for output during compaction
// Based on p99.99 of compact summary output being 17,387 tokens.
const MAX_OUTPUT_TOKENS_FOR_SUMMARY = 20_000

export function getEffectiveContextWindowSize(model: string): number {
  const reservedTokensForSummary = Math.min(
    getMaxOutputTokensForModel(model),
    MAX_OUTPUT_TOKENS_FOR_SUMMARY,
  )
  let contextWindow = getContextWindowForModel(model, getSdkBetas())
  ...
  return contextWindow - reservedTokensForSummary
}

export const AUTOCOMPACT_BUFFER_TOKENS = 13_000

  const autocompactThreshold =
    effectiveContextWindow - AUTOCOMPACT_BUFFER_TOKENS
export function getCompactPrompt(customInstructions?: string): string {
  let prompt = NO_TOOLS_PREAMBLE + BASE_COMPACT_PROMPT
  if (customInstructions && customInstructions.trim() !== '') {
    prompt += `\n\nAdditional Instructions:\n${customInstructions}`
  }
  prompt += NO_TOOLS_TRAILER
  return prompt
}
// Stop trying autocompact after this many consecutive failures.
// BQ 2026-03-10: 1,279 sessions had 50+ consecutive failures (up to 3,272)
// in a single session, wasting ~250K API calls/day globally.
const MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3
    // Increment consecutive failure count for circuit breaker.
    // The caller threads this through autoCompactTracking so the
    // next query loop iteration can skip futile retry attempts.
    const prevFailures = tracking?.consecutiveFailures ?? 0
    const nextFailures = prevFailures + 1
    if (nextFailures >= MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES) {
      logForDebugging(
        `autocompact: circuit breaker tripped after ${nextFailures} consecutive failures — skipping future attempts this session`,
        { level: 'warn' },
      )
    }
// Aggressive no-tools preamble. The cache-sharing fork path inherits the
// parent's full tool set (required for cache-key match), and on Sonnet 4.6+
// adaptive-thinking models the model sometimes attempts a tool call despite
// the weaker trailer instruction. With maxTurns: 1, a denied tool call means
// no text output → falls through to the streaming fallback (2.79% on 4.6 vs
// 0.01% on 4.5). Putting this FIRST and making it explicit about rejection
// consequences prevents the wasted turn.
const NO_TOOLS_PREAMBLE = `CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.

- Do NOT use Read, Bash, Grep, Glob, Edit, Write, or ANY other tool.
- You already have all the context you need in the conversation above.
- Tool calls will be REJECTED and will waste your only turn — you will fail the task.
- Your entire response must be plain text: an <analysis> block followed by a <summary> block.

`

const NO_TOOLS_TRAILER =
  '\n\nREMINDER: Do NOT call any tools. Respond with plain text only — ' +
  'an <analysis> block followed by a <summary> block. ' +
  'Tool calls will be rejected and you will fail the task.'
  

const BASE_COMPACT_PROMPT = `Your task is to create a detailed summary of the conversation so far, paying close attention to the user's explicit requests and your previous actions.
This summary should be thorough in capturing technical details, code patterns, and architectural decisions that would be essential for continuing development work without losing context.

${DETAILED_ANALYSIS_INSTRUCTION_BASE}

Your summary should include the following sections:

1. Primary Request and Intent: Capture all of the user's explicit requests and intents in detail
2. Key Technical Concepts: List all important technical concepts, technologies, and frameworks discussed.
3. Files and Code Sections: Enumerate specific files and code sections examined, modified, or created. Pay special attention to the most recent messages and include full code snippets where applicable and include a summary of why this file read or edit is important.
4. Errors and fixes: List all errors that you ran into, and how you fixed them. Pay special attention to specific user feedback that you received, especially if the user told you to do something differently.
5. Problem Solving: Document problems solved and any ongoing troubleshooting efforts.
6. All user messages: List ALL user messages that are not tool results. These are critical for understanding the users' feedback and changing intent.
7. Pending Tasks: Outline any pending tasks that you have explicitly been asked to work on.
8. Current Work: Describe in detail precisely what was being worked on immediately before this summary request, paying special attention to the most recent messages from both user and assistant. Include file names and code snippets where applicable.
9. Optional Next Step: List the next step that you will take that is related to the most recent work you were doing. IMPORTANT: ensure that this step is DIRECTLY in line with the user's most recent explicit requests, and the task you were working on immediately before this summary request. If your last task was concluded, then only list next steps if they are explicitly in line with the users request. Do not start on tangential requests or really old requests that were already completed without confirming with the user first.
                       If there is a next step, include direct quotes from the most recent conversation showing exactly what task you were working on and where you left off. This should be verbatim to ensure there's no drift in task interpretation.

Here's an example of how your output should be structured:

<example>
<analysis>
[Your thought process, ensuring all points are covered thoroughly and accurately]
</analysis>

<summary>
1. Primary Request and Intent:
   [Detailed description]

2. Key Technical Concepts:
   - [Concept 1]
   - [Concept 2]
   - [...]

3. Files and Code Sections:
   - [File Name 1]
      - [Summary of why this file is important]
      - [Summary of the changes made to this file, if any]
      - [Important Code Snippet]
   - [File Name 2]
      - [Important Code Snippet]
   - [...]

4. Errors and fixes:
    - [Detailed description of error 1]:
      - [How you fixed the error]
      - [User feedback on the error if any]
    - [...]

5. Problem Solving:
   [Description of solved problems and ongoing troubleshooting]

6. All user messages: 
    - [Detailed non tool use user message]
    - [...]

7. Pending Tasks:
   - [Task 1]
   - [Task 2]
   - [...]

8. Current Work:
   [Precise description of current work]

9. Optional Next Step:
   [Optional Next step to take]

</summary>
</example>

Please provide your summary based on the conversation so far, following this structure and ensuring precision and thoroughness in your response. 

There may be additional summarization instructions provided in the included context. If so, remember to follow these instructions when creating the above summary. Examples of instructions include:
<example>
## Compact Instructions
When summarizing the conversation focus on typescript code changes and also remember the mistakes you made and how you fixed them.
</example>

<example>
# Summary instructions
When you are using compact - please focus on test output and code changes. Include file reads verbatim.
</example>
`

Agent 实践杂录

作者 Shiina
2025年10月17日 08:00

2025 年大家都忙着搞 agent. 下面分类是随便分的.

Tools/Prompts

Context-Engineering

Compression/Memory/Planning

Skills

Agent Harness

现在趋势是自己不做 index (分块 + 向量化 + 向量数据库), 直接让 LLM grep 或者 web search. 比如 Claude, Cline, manus 都是如此.

Claude Code

Traps

Deep Research

AI Coding

Training: Agentic RL

RAG 简要回顾

作者 Shiina
2025年10月7日 08:00

2025 年大家都忙着开发 agent, 这里简要回顾一下 RAG.

RAG 基本操作

  • Offline: 文件解析, 文本切片, embedding (以前通常用 bge)
  • 对 query embedding 后做召回 (通常就算个 cos, chunk 量大时用向量数据库牺牲一定精度加速召回)
  • Rerank (通常是 bge-reranker)

这套早在 2023 年就玩烂了.

Advanced RAG

Offline

文本切片优化

  • 按照语义切分: 想法是, 先得到句子 embedding, 若相邻句子 embedding 距离较大 (比如可以统计分位数取阈值), 则认为语义差别大, 在这里切分.
  • 按结构切分: 比如根据 markdown 的标题层级, 图表, 代码等, 保证有意义的结构不被切开. 这里可以把 chunk 所属的标题放在其 metadata 里或者直接拼在 chunk 开头; 或者用 LLM 总结 chunk 生成一个 heading 拼上去.
  • 还有很多雕花级别的操作, 可以参考这些 2023 年的 RAG 比赛, B 站上也有答辩视频.

Embedding 优化

给每个 chunk 生成更多 “维度” 的 embedding. 比如对当前 chunk 做个总结得到 embedding, 或者把当前 chunk 对应的 window 更大的 chunk 或者段落以及章节层级拿来做 embedding (甚至是层次化的 embedding 以及召回). 命中 embedding 之后可以连带地在当前 chunk 前后扩展一定 window 或者段落带出更完整连贯的上下文.

Online

Query 处理

  • Query 分类 (意图识别/路由等)
  • 生成更多维度 embedding: 比如 HyDE (Hypothetical Document Embedding), 根据 query 生成伪文档再去召回, 把 qa 匹配变成 aa 匹配. 类似地, 离线时可以对每个 chunk 生成可能的 query, 把 qa 匹配变成 qq 匹配.

拼接上下文

  • 扩大窗口 (之前讲过了, 带出当前 chunk 对应的 window 更大的 chunk)
  • 顺序 (如果 chunk 来自同一篇文档, 按文中出现的顺序排序, 离得近可以补充一些 gap 等让段落更连贯)
  • 根据层级 (之前讲过了, 带出当前 chunk 对应的章节)
  • 压缩 (还是靠 LLM 搞)

评估

  • 召回评估
    • 效果指标: recall@k, precision@k, mAP, mrr 等, 可以参考 这里
    • 性能指标: 平均响应时间, QPS 承载能力, 可用性/节点故障恢复时间
    • 成本指标: 单位向量存储成本, 单位检索成本
    • 在线评估: 检索结果点击率 (CTR), 停留时间 (查看检索结果的时间), 二次检索率 (看了结果后再次检索的比例, 越低越好), 用户满意度评分
  • 生成评估
    • 效果指标: 事实准确率 (回答与检索信息一致), 幻觉率 (回答包含检索信息外内容的比例), 格式符合度, 用户满意度
    • 性能指标: 首 token 时间, QPS, 可用性
    • 成本指标: 单位请求成本 (GPU 资源成本), GPU 利用率

Graph RAG

参考 LightRAG 以及微软的 GraphRAG, 宣传中 Graph RAG 能做这两件事情 (1) 回答全局问题, 比如总结全书; (2) 回答多跳问题.

其中第一点我的理解是, Graph RAG 相当于做了层级 (图的层级聚类) 的摘要, 越往上层级就是摘要的摘要, 所以所谓的能解决全局问题其实是提前通过摘要的摘要把回答准备好了.

至于第二点, 我的理解是不如 agentic RAG. Graph RAG 企图通过图关系, 一步 (虽然后续也有工作是多步迭代式召回) 把多跳关系找全, 很难做好. 构建图谱就不是 trivial 的事情, 光是定义什么东西算个结点都不容易, 实体 (结点) 链接与消歧也不容易. 而召回时需要利用图谱的边, 实际上需要 “预先知道要利用到这类边”, 构图时才能构出来. 构图过程的计算量和存储需求都很大, 后续更新也很难做. 图的构建说到底还是看 LLM 本身能力. 而 agentic rag 允许多次检索, 同样是依赖 LLM 本身能力, 但不需要预先对知识库构建图谱 (所以能直接用上 web search 等更通用的能力). 行动机制上更像人, 也更容易 scaling. 另外可以见 你为什么要用 GraphRAG?

那图谱到底有什么用? 我的理解是可以通过图谱构建 agent 训练数据, 比如 web sailor.

Agentic RAG

其实就是让 LLM 自己取做判断, 比如: 召回文档是否相关? 够不够解决问题? 这个回答有没有乱编? 等等. 一个比较典型的应用是 deep research, 具体就要开另一篇博客了.

其他

备用资料

用 Pydantic 自动生成 LLM Tool Schema

作者 Shiina
2025年9月14日 08:00

简单小工具.

定义 tool 参数后, 不引入其他库, 仅用 Pydantic 自动生成符合 OpenAI 规范的 Tool Schema. 想法很简单, 把 Pydantic 的 model_json_schema 生成的 JSON Schema 处理成 OpenAI 规范即可.

好处是 (1) 不用引入或依赖其他乱七八糟的库; (2) 不用手动额外维护一套工具描述; (3) 能利用 Pydantic 的一些功能, 从 JSON string load 之后自动校验参数, 自动转换类型等.

基础示例

比如

class GetWeatherArgs(BaseModel):
    """Retrieves current weather for the given location."""
    location: str = Field(description="City and country e.g. Bogotá, Colombia")
    units: Literal["celsius", "fahrenheit"] = Field(description="Units the temperature will be returned in.")


def get_weather(args: GetWeatherArgs):
    """实际的工具处理逻辑"""
    pass


get_weather_tool = create_tool_from_pydantic(GetWeatherArgs)
print(json.dumps(get_weather_tool, ensure_ascii=False, indent=2))
{
  "type": "function",
  "function": {
    "name": "get_weather",
    "description": "Retrieves current weather for the given location.",
    "parameters": {
      "type": "object",
      "properties": {
        "location": {
          "type": "string",
          "description": "City and country e.g. Bogotá, Colombia"
        },
        "units": {
          "type": "string",
          "description": "Units the temperature will be returned in.",
          "enum": [
            "celsius",
            "fahrenheit"
          ]
        }
      },
      "required": [
        "location",
        "units"
      ]
    }
  }
}

完整代码

import datetime
import json
import re
import textwrap
from enum import StrEnum
from typing import Type, Literal, Optional, List, Any

import pydantic
from pydantic import BaseModel, Field


def _clean_text(text: str) -> str:
    """清理多行字符串的缩进和行尾空格。"""
    return textwrap.dedent(text).strip()


def _process_property(prop_schema: dict, defs: dict) -> dict:
    """递归地处理单个属性的 Schema,将其转换为 Tool 参数格式。"""
    # 1. 处理 Optional[T],在 Pydantic v2 中表现为 anyOf 包含 'null'
    if 'anyOf' in prop_schema:
        # 找到非 null 的那个 schema 定义
        non_null_schema = next((s for s in prop_schema['anyOf'] if s.get('type') != 'null'), None)
        if non_null_schema:
            # 递归处理,但保留外层的 description
            processed_schema = _process_property(non_null_schema, defs)
            if 'description' in prop_schema:
                processed_schema['description'] = _clean_text(prop_schema['description'])
            return processed_schema
        else:  # 理论上不应该只有 null
            return {}

    # 2. 处理嵌套对象 ($ref)
    if '$ref' in prop_schema:
        ref_name = prop_schema['$ref'].split('/')[-1]
        nested_schema = defs.get(ref_name)
        if nested_schema:
            # 对于嵌套对象,我们再次调用主转换函数
            return pydantic_to_tool_schema(nested_schema, defs)

    # 3. 处理基本类型和数组
    result = {}
    prop_type = prop_schema.get('type')

    if prop_type:
        result['type'] = prop_type
    if 'description' in prop_schema:
        result['description'] = _clean_text(prop_schema['description'])
    if 'enum' in prop_schema:
        result['enum'] = prop_schema['enum']

    # 3a. 处理数组 (List[T])
    if prop_type == 'array' and 'items' in prop_schema:
        # 递归处理数组元素的类型
        result['items'] = _process_property(prop_schema['items'], defs)

    return result


def pydantic_to_tool_schema(schema: dict, defs: dict = None) -> dict:
    """将 Pydantic 的 JSON Schema 转换为 Tool 的 parameters 部分。"""
    if defs is None:
        defs = schema.get('$defs', {})

    tool_params = {
        "type": "object",
        "properties": {},
        "required": schema.get("required", []),
    }

    # 顶层描述 (来自类的 docstring)
    if 'description' in schema:
        tool_params['description'] = _clean_text(schema['description'])

    properties = schema.get("properties", {})
    for name, prop_schema in properties.items():
        tool_params["properties"][name] = _process_property(prop_schema, defs)

    return tool_params


def create_tool_from_pydantic(pydantic_model: Type[BaseModel]) -> dict:
    """
    根据 Pydantic 模型自动创建一个符合 OpenAI 规范的 Tool 定义。

    - 自动从模型类名推断函数名 (例如 GetWeatherArgs -> get_weather)。
    - 自动使用模型的 docstring 作为工具的描述。
    """
    # 1. 从模型类名推断函数名
    model_name = pydantic_model.__name__
    class_name = model_name.removesuffix('Args')
    # 将驼峰命名 (CamelCase) 转换为下划线命名 (snake_case)
    function_name = re.sub(r'(?<!^)(?=[A-Z])', '_', class_name).lower()

    # 2. 生成 Pydantic Schema 并转换为 Tool Schema
    pydantic_schema = pydantic_model.model_json_schema()
    tool_schema = pydantic_to_tool_schema(pydantic_schema)

    description = tool_schema.pop("description", "")  # 描述移动到外层

    # 3. 构建并返回完整的 Tool 定义
    return {
        "type": "function",
        "function": {
            "name": function_name,
            "description": description,
            "parameters": tool_schema,
        },
    }


class GetWeatherArgs(BaseModel):
    """Retrieves current weather for the given location."""
    location: str = Field(description="City and country e.g. Bogotá, Colombia")
    units: Literal["celsius", "fahrenheit"] = Field(description="Units the temperature will be returned in.")


def get_weather(args: GetWeatherArgs):
    """实际的工具处理逻辑"""
    pass


get_weather_tool = create_tool_from_pydantic(GetWeatherArgs)
print(json.dumps(get_weather_tool, ensure_ascii=False, indent=2))

复杂点的例子

可以定义嵌套模型, 枚举类型, 添加自定义校验逻辑等. 下面的 SearchFilesArgs 模型演示了如何处理文件搜索场景, 它包含了对文件类型 (FileType 枚举) 和创建时间 (嵌套的 TimeRange 模型) 的筛选.

我们还定义了一个 LLMProofBaseModel 基类, 能自动处理来自 LLM 的 'null' 字符串输入. 嵌套的 TimeRange 模型中的校验器 check_dates 也展示了如何在数据模型层面封装业务规则.

# --- 接上一段代码 ---

class LLMProofBaseModel(BaseModel):
    """自动将所有字段中值为字符串 'null' 的输入转换为 None"""
    @pydantic.field_validator('*', mode='before')
    @classmethod
    def _clean_null_str(cls, v: Any) -> Any:
        if isinstance(v, str) and v.lower() == 'null':
            return None
        return v


class TimeRange(LLMProofBaseModel):
    """这个 docstring 不会用到"""
    start_date: Optional[datetime.date] = Field(None, description="开始日期 (YYYY-MM-DD)")
    end_date: Optional[datetime.date] = Field(None, description="结束日期 (YYYY-MM-DD)")
    random_field: Optional[str] = Field(None, description='演示用')

    @pydantic.model_validator(mode='after')
    def check_dates(self) -> 'TimeRange':
        if self.start_date and self.end_date and self.start_date > self.end_date:
            # 抛出错误或者其他处理方式
            self.end_date = self.start_date
        return self


class FileType(StrEnum):
    PDF = "pdf"
    PPT = "ppt"


class SearchFilesArgs(LLMProofBaseModel):
    """
    搜索文件

    多行示例
    - xx
    - yy
    """
    query: str = Field(description="根据用户问题提炼出的核心搜索查询语句")
    file_types: Optional[List[Literal[*FileType]]] = Field(None, description="文件类型")
    time_range: Optional[TimeRange] = Field(None, description="文件创建时间范围")


search_file_tool = create_tool_from_pydantic(SearchFilesArgs)

tools = [
    get_weather_tool,
    search_file_tool,
]
print(json.dumps(tools, ensure_ascii=False, indent=2))

args1 = GetWeatherArgs.model_validate({"location": "Bogotá, Colombia", "units": "celsius"})
args2 = SearchFilesArgs.model_validate(
    {
        "query": "年报", "file_types": ["pdf"],
        "time_range": {"start_date": "2025-01-01", "end_date": "2024-01-01", "random_field": "null"},
    }
)
[
  {
    "type": "function",
    "function": {
      "name": "get_weather",
      "description": "Retrieves current weather for the given location.",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "City and country e.g. Bogotá, Colombia"
          },
          "units": {
            "type": "string",
            "description": "Units the temperature will be returned in.",
            "enum": [
              "celsius",
              "fahrenheit"
            ]
          }
        },
        "required": [
          "location",
          "units"
        ]
      }
    }
  },
  {
    "type": "function",
    "function": {
      "name": "search_files",
      "description": "搜索文件\n\n多行示例\n- xx\n- yy",
      "parameters": {
        "type": "object",
        "properties": {
          "query": {
            "type": "string",
            "description": "根据用户问题提炼出的核心搜索查询语句"
          },
          "file_types": {
            "type": "array",
            "items": {
              "type": "string",
              "enum": [
                "pdf",
                "ppt"
              ]
            },
            "description": "文件类型"
          },
          "time_range": {
            "type": "object",
            "properties": {
              "start_date": {
                "type": "string",
                "description": "开始日期 (YYYY-MM-DD)"
              },
              "end_date": {
                "type": "string",
                "description": "结束日期 (YYYY-MM-DD)"
              },
              "random_field": {
                "type": "string",
                "description": "演示用"
              }
            },
            "required": [],
            "description": "文件创建时间范围"
          }
        },
        "required": [
          "query"
        ]
      }
    }
  }
]

A Guide for Customizing Difficulty in Tactics Ogre One Vision Mod

作者 Shiina
2023年12月8日 08:00

This guide will walk you through using cheat codes to tailor—and particularly increase—the challenge level in Tactics Ogre: Let Us Cling Together (PSP), with a focus on the One Vision mod.

Important Note: The first line of any .ini file for cheats is ignored. Always leave it blank.

Quality-of-Life

_C1 Always Show Trajectory
_L 0x20011774 0x00000000

_C1 Shop Anywhere On World Map
_L 0x20081398 0x00000000
_L 0x20083278 0x00000000

_C1 Warren Report Advanced Features
//Unlock World Tarot and Chaos Frame
_L 0x702FD60A 0x00000018

_C1 Incapacitated Warp Away
_L 0x2006EB68 0x12020014

Note on “Incapacitated Warp Away”: While convenient, this can backfire by causing instant defeat if Denam falls. On the flip side, watching monsters use shiftstones when incapacitated is oddly entertaining.

Moderate Difficulty Increases

_C1 No Tarot Stat Bonus
_L 0x20025C28 0x34060000
_L 0x20025C34 0x34060000

_C1 Enemy Has Everything x99 (Consumables)
_L 0x0055330F 0x00000063
_L 0x40553310 0x001E0001
_L 0x63636363 0x00000000
_L 0x10553388 0x00006363
Better Enemy Gear (Expanded Code) (Show more »)
_C1 Enemies Have access to all gear
_L 0X00457AE6 0x00000002
_L 0X00457BD6 0x00000002
_L 0X00457CC6 0x00000002
_L 0X00457DB6 0x00000002
_L 0X00457EA6 0x00000002
_L 0X00457F96 0x00000002
_L 0X00458086 0x00000002
_L 0X00458176 0x00000002
_L 0X004581EE 0x00000002
_L 0X00458266 0x00000002
_L 0X004582DE 0x00000002
_L 0X00458356 0x00000002
_L 0X004583CE 0x00000002
_L 0X00458446 0x00000002
_L 0X004584BE 0x00000002
_L 0X00458536 0x00000002
_L 0X004585AE 0x00000002
_L 0X00458626 0x00000002
_L 0X0045869E 0x00000002
_L 0X0045878E 0x00000000
_L 0X0045887E 0x00000000
_L 0X0045896E 0x00000000
_L 0X00458A5E 0x00000000
_L 0X00458B4E 0x00000000
_L 0X00458C3E 0x00000000
_L 0X00458D2E 0x00000000
_L 0X00458E1E 0x00000000
_L 0X00458E96 0x00000000
_L 0X00458F0E 0x00000000
_L 0X00458F86 0x00000000
_L 0X00458FFE 0x00000000
_L 0X00459076 0x00000000
_L 0X004590EE 0x00000000
_L 0X00459166 0x00000000
_L 0X004591DE 0x00000000
_L 0X00459256 0x00000000
_L 0X00459346 0x00000000
_L 0X00459436 0x00000000
_L 0X00459526 0x00000000
_L 0X00459616 0x00000000
_L 0X00459706 0x00000000
_L 0X004597F6 0x00000000
_L 0X004598E6 0x00000000
_L 0X004599D6 0x00000000
_L 0X00459A4E 0x00000000
_L 0X00459AC6 0x00000000
_L 0X00459B3E 0x00000000
_L 0X00459BB6 0x00000000
_L 0X00459C2E 0x00000000
_L 0X00459CA6 0x00000000
_L 0X00459D1E 0x00000000
_L 0X00459E0E 0x00000000
_L 0X00459E86 0x00000000
_L 0X00459EFE 0x00000000
_L 0X00459F76 0x00000000
_L 0X00459FEE 0x00000000
_L 0X0045A066 0x00000000
_L 0X0045A0DE 0x00000000
_L 0X0045A156 0x00000000
_L 0X0045A2BE 0x00000002
_L 0X0045A3AE 0x00000002
_L 0X0045A49E 0x00000002
_L 0X0045A58E 0x00000002
_L 0X0045A67E 0x00000002
_L 0X0045A76E 0x00000002
_L 0X0045A85E 0x00000002
_L 0X0045A94E 0x00000002
_L 0X0045A9C6 0x00000002
_L 0X0045AA3E 0x00000002
_L 0X0045AAB6 0x00000002
_L 0X0045AB2E 0x00000002
_L 0X0045ABA6 0x00000002
_L 0X0045AC1E 0x00000002
_L 0X0045AC96 0x00000002
_L 0X0045AD0E 0x00000002
_L 0X0045AD86 0x00000002
_L 0X0045ADFE 0x00000002
_L 0X0045AE76 0x00000002
_L 0X0045AF66 0x00000000
_L 0X0045B056 0x00000000
_L 0X0045B146 0x00000002
_L 0X0045B236 0x00000002
_L 0X0045B326 0x00000000
_L 0X0045B416 0x00000000
_L 0X0045B506 0x00000002
_L 0X0045B5F6 0x00000002
_L 0X0045B66E 0x00000000
_L 0X0045B6E6 0x00000000
_L 0X0045B75E 0x00000002
_L 0X0045B7D6 0x00000002
_L 0X0045B84E 0x00000002
_L 0X0045B8C6 0x00000002
_L 0X0045B93E 0x00000002
_L 0X0045B9B6 0x00000000
_L 0X0045BA2E 0x00000002
_L 0X0045BAA6 0x00000002
_L 0X0045BB1E 0x00000000
_L 0X0045BC0E 0x00000002
_L 0X0045BCFE 0x00000002
_L 0X0045BDEE 0x00000002
_L 0X0045BEDE 0x00000002
_L 0X0045BFCE 0x00000002
_L 0X0045C0BE 0x00000002
_L 0X0045C1AE 0x00000002
_L 0X0045C29E 0x00000002
_L 0X0045C316 0x00000002
_L 0X0045C38E 0x00000002
_L 0X0045C406 0x00000002
_L 0X0045C47E 0x00000002
_L 0X0045C4F6 0x00000002
_L 0X0045C56E 0x00000002
_L 0X0045C5E6 0x00000002
_L 0X0045C65E 0x00000002
_L 0X0045C6D6 0x00000002
_L 0X0045C74E 0x00000002
_L 0X0045C7C6 0x00000002
_L 0X0045C8B6 0x00000000
_L 0X0045C92E 0x00000010
_L 0X0045C9A6 0x00000000
_L 0X0045CA96 0x00000000
_L 0X0045CB86 0x00000002
_L 0X0045CC76 0x00000000
_L 0X0045CD66 0x00000002
_L 0X0045CE56 0x00000000
_L 0X0045CF46 0x00000002
_L 0X0045CFBE 0x00000000
_L 0X0045D036 0x00000000
_L 0X0045D0AE 0x00000002
_L 0X0045D126 0x00000002
_L 0X0045D19E 0x00000000
_L 0X0045D216 0x00000000
_L 0X0045D28E 0x00000002
_L 0X0045D306 0x00000000
_L 0X0045D37E 0x00000002
_L 0X0045D3F6 0x00000002
_L 0X0045D46E 0x00000002
_L 0X0045D55E 0x00000000
_L 0X0045D64E 0x00000000
_L 0X0045D73E 0x00000000
_L 0X0045D82E 0x00000000
_L 0X0045D91E 0x00000000
_L 0X0045DA0E 0x00000000
_L 0X0045DAFE 0x00000000
_L 0X0045DBEE 0x00000000
_L 0X0045DC66 0x00000000
_L 0X0045DCDE 0x00000000
_L 0X0045DD56 0x00000000
_L 0X0045DDCE 0x00000000
_L 0X0045DE46 0x00000000
_L 0X0045DEBE 0x00000000
_L 0X0045DF36 0x00000000
_L 0X0045DFAE 0x00000000
_L 0X0045E026 0x00000000
_L 0X0045E09E 0x00000000
_L 0X0045E116 0x00000000
_L 0X0045E206 0x00000002
_L 0X0045E2F6 0x00000002
_L 0X0045E3E6 0x00000002
_L 0X0045E4D6 0x00000002
_L 0X0045E5C6 0x00000002
_L 0X0045E6B6 0x00000002
_L 0X0045E7A6 0x00000002
_L 0X0045E896 0x00000002
_L 0X0045E90E 0x00000002
_L 0X0045E986 0x00000002
_L 0X0045E9FE 0x00000002
_L 0X0045EA76 0x00000002
_L 0X0045EAEE 0x00000002
_L 0X0045EB66 0x00000002
_L 0X0045EBDE 0x00000002
_L 0X0045EC56 0x00000002
_L 0X0045ECCE 0x00000002
_L 0X0045ED46 0x00000002
_L 0X0045EDBE 0x00000002
_L 0X0045EEAE 0x00000002
_L 0X0045EF9E 0x00000000
_L 0X0045F08E 0x00000002
_L 0X0045F17E 0x00000002
_L 0X0045F26E 0x00000000
_L 0X0045F35E 0x00000002
_L 0X0045F3D6 0x00000000
_L 0X0045F44E 0x00000000
_L 0X0045F4C6 0x00000000
_L 0X0045F53E 0x00000000
_L 0X0045F5B6 0x00000000
_L 0X0045F62E 0x00000000
_L 0X0045F6A6 0x00000000
_L 0X0045F71E 0x00000000
_L 0X0045F796 0x00000000
_L 0X0045F80E 0x00000000
_L 0X0045F886 0x00000000
_L 0X0045F8FE 0x00000002
_L 0X0045F976 0x00000002
_L 0X0045FA66 0x00000002
_L 0X0045FB56 0x00000002
_L 0X0045FBCE 0x00000002
_L 0X0045FC46 0x00000002
_L 0X0045FCBE 0x00000002
_L 0X0045FD36 0x00000002
_L 0X0045FDAE 0x00000000
_L 0X0045FE26 0x00000002
_L 0X004602D6 0x00000002
_L 0X0046034E 0x00000000
_L 0X004603C6 0x00000000
_L 0X0046043E 0x00000002
_L 0X004604B6 0x00000002
_L 0X0046052E 0x00000002
_L 0X004605A6 0x00000002
_L 0X0046061E 0x00000002
_L 0X00460696 0x00000002
_L 0X0046070E 0x00000002
_L 0X00460786 0x00000002
_L 0X00460876 0x00000000
_L 0X004608EE 0x00000000
_L 0X00460966 0x00000000
_L 0X004609DE 0x00000000
_L 0X00460A56 0x00000000
_L 0X00460ACE 0x00000000
_L 0X00460B46 0x00000000
_L 0X00460BBE 0x00000000
_L 0X00460C36 0x00000000
_L 0X00460D26 0x00000021
_L 0X00460E16 0x00000023
_L 0X00460F06 0x00000023
_L 0X00460FF6 0x00000021
_L 0X004610E6 0x00000023
_L 0X004611D6 0x00000023
_L 0X004612C6 0x00000021
_L 0X0046133E 0x00000023
_L 0X004613B6 0x00000023
_L 0X0046142E 0x00000023
_L 0X004614A6 0x00000023
_L 0X0046151E 0x00000023
_L 0X00461596 0x00000023
_L 0X0046160E 0x00000023
_L 0X00461686 0x00000023
_L 0X004616FE 0x00000023
_L 0X00461776 0x00000023
_L 0X004617EE 0x00000023
_L 0X00461866 0x00000023
_L 0X004618DE 0x00000023
_L 0X004619CE 0x00000000
_L 0X00461ABE 0x00000002
_L 0X00461BAE 0x00000002
_L 0X00461C9E 0x00000000
_L 0X00461D8E 0x00000002
_L 0X00461E7E 0x00000002
_L 0X00461F6E 0x00000002
_L 0X0046205E 0x00000000
_L 0X004620D6 0x00000002
_L 0X0046214E 0x00000002
_L 0X004621C6 0x00000002
_L 0X0046223E 0x00000002
_L 0X004622B6 0x00000002
_L 0X0046232E 0x00000002
_L 0X004623A6 0x00000002
_L 0X0046241E 0x00000002
_L 0X00462496 0x00000002
_L 0X0046250E 0x00000002
_L 0X00462586 0x00000002
_L 0X004625FE 0x00000000
_L 0X00462676 0x00000000
_L 0X004626EE 0x00000002
_L 0X00462766 0x00000002
_L 0X004627DE 0x00000002
_L 0X00462856 0x00000002
_L 0X004628CE 0x00000000
_L 0X00462946 0x00000002
_L 0X004629BE 0x00000000
_L 0X00462A36 0x00000002
_L 0X00462DF6 0x00000000
_L 0X0046331E 0x00000008
_L 0X00463396 0x00000008
_L 0X00463486 0x00000000
_L 0X00463576 0x00000000
_L 0X00463666 0x00000000
_L 0X00463756 0x00000000
_L 0X00463846 0x00000000
_L 0X00463936 0x00000000
_L 0X00463A26 0x00000000
_L 0X00463B16 0x00000000
_L 0X00463B8E 0x00000000
_L 0X00463C06 0x00000000
_L 0X00463C7E 0x00000000
_L 0X00463CF6 0x00000000
_L 0X00463D6E 0x00000000
_L 0X00463DE6 0x00000000
_L 0X00463E5E 0x00000000
_L 0X00463ED6 0x00000000
_L 0X00463F4E 0x00000000
_L 0X00463FC6 0x00000000
_L 0X0046403E 0x00000000
_L 0X004640B6 0x00000000
_L 0X0046412E 0x00000000
_L 0X004641A6 0x00000000
_L 0X00464296 0x00000000
_L 0X00464386 0x00000000
_L 0X00464476 0x00000000
_L 0X00464566 0x00000000
_L 0X00464656 0x00000000
_L 0X004646CE 0x00000000
_L 0X00464746 0x00000000
_L 0X004647BE 0x00000000
_L 0X00464836 0x00000000
_L 0X004648AE 0x00000000
_L 0X00464926 0x00000000
_L 0X00464A16 0x00000000
_L 0X00464B06 0x00000000
_L 0X00464BF6 0x00000000
_L 0X00464CE6 0x00000000
_L 0X00464DD6 0x00000000
_L 0X00464EC6 0x00000000
_L 0X00464FB6 0x00000000
_L 0X004650A6 0x00000000
_L 0X0046511E 0x00000000
_L 0X00465196 0x00000000
_L 0X0046520E 0x00000000
_L 0X00465286 0x00000000
_L 0X004652FE 0x00000000
_L 0X00465376 0x00000000
_L 0X004653EE 0x00000000
_L 0X00465466 0x00000000
_L 0X004654DE 0x00000000
_L 0X00465556 0x00000000
_L 0X004655CE 0x00000000
_L 0X00465646 0x00000000
_L 0X004656BE 0x00000000
_L 0X00465736 0x00000000
_L 0X004657AE 0x00000000
_L 0X00465826 0x00000000
_L 0X0046589E 0x00000000
_L 0X00465916 0x00000000
_L 0X0046598E 0x00000000
_L 0X00465A06 0x00000000
_L 0X00465A7E 0x00000000
_L 0X00465AF6 0x00000000
_L 0X00465B6E 0x00000000
_L 0X00465BE6 0x00000000
_L 0X00465C5E 0x00000000
_L 0X00465CD6 0x00000000
_L 0X00465D4E 0x00000000
_L 0X00465DC6 0x00000000
_L 0X00465E3E 0x00000010
_L 0X00465F2E 0x00000000
_L 0X0046601E 0x00000000
_L 0X0046610E 0x00000000
_L 0X004661FE 0x00000000
_L 0X004662EE 0x00000000
_L 0X004663DE 0x00000000
_L 0X004664CE 0x00000000
_L 0X00466546 0x00000000
_L 0X004665BE 0x00000000
_L 0X00466636 0x00000000
_L 0X004666AE 0x00000000
_L 0X00466726 0x00000000
_L 0X0046679E 0x00000000
_L 0X00466816 0x00000000
_L 0X0046688E 0x00000000
_L 0X00466906 0x00000000
_L 0X0046697E 0x00000000
_L 0X00466A6E 0x00000000
_L 0X00466B5E 0x00000000
_L 0X00466C4E 0x00000000
_L 0X00466D3E 0x00000000
_L 0X00466E2E 0x00000000
_L 0X00466EA6 0x00000000
_L 0X00466F1E 0x00000000
_L 0X00466F96 0x00000000
_L 0X0046700E 0x00000000
_L 0X00467086 0x00000000
_L 0X004670FE 0x00000000
_L 0X00467176 0x00000000
_L 0X004671EE 0x00000000
_L 0X00467266 0x00000000
_L 0X004672DE 0x00000000
_L 0X00467356 0x00000000
_L 0X004673CE 0x00000008
_L 0X00467446 0x00000000

Advanced Customization

For players who want fine-tuned control over difficulty, advanced cheats allow arbitrary adjustments. These are best applied in-battle via the following workflow:

  1. Generate Cheats: Use scripts (like this generator) to create tailored cheats. (Note: The tool is functional but is a bit messy.)
  2. Enter Battle: Activate cheats and start the battle.
  3. Save & Exit: Confirm cheats are active, save mid-battle, then quit the game.
  4. Deactivate Cheats: Disable the relevant code before relaunching.
  5. Reload: Resume your saved battle with the cheats “baked in.”

There used to be some useful code on tapatalk, but those pages are unavailable now (you may still find some here).

Example: Skipping Tedious Battles

Set all enemy HP to 1 and grant infinite actions to breeze through repetitive fights (e.g. dungeons):

def cur_hp(left, right, hp=1):
    '''
    example: cur_hp(11, 11+12, 1)
    
    then all the current of the 11th - 22nd units (12 in total)
    will be set 1, and will be killed by one hit

    you may use it, say, when you have a party of 10 units,
    and there are 12 enemies. then you can sweeping them out easily.
    used to skip some boring and repetitive maps

    note that if boss is set 1 hp, then he won't be killed forever
    while if others are set 0 hp, then they won't die forever even
    though they have 0 hp
    '''
    print(f'_C0 No.{left}-{right-1} units on field HP {hp}')
    for i in range(left-1, right-1):
        loc = hex(0x10540140 + 0x520*i)
        print(f'_L {loc} 0x0000000{hp}')

cur_hp(11, 11+12, 1)
_C0 Action Infinite
_L 0x00E3132C 0x00000000
_L 0x00E3134C 0x00000000
_L 0x00E3136C 0x00000000

Example: Adjusting Skill Ranks

Boost enemy prowess while limiting your own:

def skill_rank(left, right, rank=8):
    '''
    example: skill_rank(11, 11+12, 6)
    
    then all the skill ranks of the 11th - 22nd units (12 in total)
    on the field will be set rank 6,
    and effect will remain after the battle for your own characters
 
    you may use it, say, when you have a party of 10 units,
    and there are 12 enemies. then all the enemy
    skill ranks will be set 6

    note that guest has a different order. e.g. when you have
    a party of 10 units and a guest, the guest may not be
    the 11th unit
    '''
    print(f'_C0 No.{left}-{right-1} units on field all skills rank {rank}')
    for i in range(left-1, right-1):
        for j in range(10):
            loc = hex(0x005401D8 + 0x520*i + 0x6*j)
            loc = loc[:2] + '00' + loc[2:]
            print(f'_L {loc} 0x0000000{rank}')

skill_rank(11, 11+12, 6)

Additional Challenge Ideas (Work in Progress)

While Tactics Ogre and the One Vision mod aren’t inherently designed for extreme difficulty, if you’re looking to add some extra spice and challenge, here are some general tips to consider:

  • “Boss Last” Challenge: Force yourself to defeat all minor enemies before taking down the boss. This can be surprisingly difficult (since you have to keep the boss alive!) but adds an interesting strategic layer.
  • Self-Imposed Bans: Identify and avoid using “overpowered” units, classes, or abilities as you see fit to prevent trivializing encounters.
  • Skill Slot Limitations: Challenge yourself by limiting the number of equipped skills (e.g., only four instead of the usual ten).
  • Level and Skill Disadvantage: Deliberately set your own characters to lower levels and skill ranks compared to your enemies.

These rules, whether self-imposed or enforced through cheats, won’t just increase difficulty; they’ll also encourage more varied team compositions and deeper strategic thinking. Enjoy your customized Tactics Ogre experience!

难倒各路大模型的两道简单 SQLite 问题

作者 Shiina
2025年5月5日 08:00

问题描述以及示例 prompt 如下


你是 SQLite 专家, 请完成下面两个问题.

  1. 用 SQLite 写一个 query, 根据 “now” 获得当地时间今日零点的 unix 时间戳. 注: “当地” 指执行 SQL 的机器的系统时区, “今日” 指当地日期的今日.

例: 若 now 为 ‘2025-05-05 04:00:00+08:00’, 则返回 ‘2025-05-05 00:00:00+08:00’. (假设当地时区为 UTC+8)

  1. 用 SQLite 写一个 query, 根据 “now” 获得上周的周一的日期. 假设周一为一周的开始, 全程只在 UTC 时间考虑问题 (不用考虑时区).

例: 若 now 为 ‘2025-05-05’ 周一, 则返回 ‘2025-04-28’. 若 now 为 ‘2025-05-04’ 周日, 则返回 ‘2025-04-21’.

参考代码如下, 请填充对应的 SQL.

import sqlite3
import time


def f(sql):
    conn = sqlite3.connect(':memory:')
    cursor = conn.cursor()
    cursor.execute(sql)
    result = cursor.fetchone()[0]
    if isinstance(result, int):
        print(time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(result)))
    elif isinstance(result, str):
        print(result)
    conn.close()


# now 替换成了具体时间戳
sqls = [
    # 当地今日零点
    "SELECT unixepoch('2025-05-05 04:00:00+08:00', ...)",
    
    # 上周周一
    "SELECT DATE('2025-05-05', ...)",
    "SELECT DATE('2025-05-04', ...)",
]
for sql in sqls:
    f(sql)

在看答案之前可以自己尝试做一做.

问了一圈 LLM, 4o 无联网非推理模式能完全做对 (一定概率做对, 推理反而做不对). DeepSeek-R1 能做对 (一定概率). 其他大多数 LLM 包括 Gemini 2.5 Pro 也没全对.

参考答案

这里 的 4o 回答.

以下是完整的 SQL 填空答案,分别对应两个问题。

1. 当地今日零点的 unix 时间戳

我们要将 '2025-05-05 04:00:00+08:00' 转换为当地时间的今日零点,再转为 unix 时间戳。

SQLite 的 unixepoch() 函数会返回 UTC 的时间戳,所以我们需要先将时间转换为本地的今日零点,然后取 unixepoch()

SELECT unixepoch('2025-05-05 04:00:00+08:00', 'localtime', 'start of day', 'utc')

解释:

  • 'localtime':将输入时间转为本地时间。
  • 'start of day':获取该地时间的当天零点。
  • 'utc':再转为 UTC 时间,以便 unixepoch() 得到正确的 unix 时间戳。

2. 上周周一的日期(仅考虑 UTC)

SQLite 的 strftime('%w', date) 得到一周中的数字(0=周日,1=周一,…,6=周六)。我们可以用它来计算相对于当前日期的偏移。

公式:当前日期 - (当前是周几 + 6) % 7 + 7

所以最终表达式是:

SELECT DATE('2025-05-05', '-' || ((strftime('%w', '2025-05-05') + 6) % 7 + 7) || ' days')

对于 '2025-05-05'(周一,%w=1):

  • (1 + 6) % 7 + 7 = 0 + 7 = 7 → 返回上周一:2025-04-28

对于 '2025-05-04'(周日,%w=0):

  • (0 + 6) % 7 + 7 = 6 + 7 = 13 → 返回上周一:2025-04-21

✅ 填充完整代码如下:

sqls = [
    # 当地今日零点
    "SELECT unixepoch('2025-05-05 04:00:00+08:00', 'localtime', 'start of day', 'utc')",
    
    # 上周周一
    "SELECT DATE('2025-05-05', '-' || ((strftime('%w', '2025-05-05') + 6) % 7 + 7) || ' days')",
    "SELECT DATE('2025-05-04', '-' || ((strftime('%w', '2025-05-04') + 6) % 7 + 7) || ' days')",
]

是否需要我帮你扩展为一个支持动态 now 参数的通用函数?

答案说明

第一题

今日零点我的写法和 4o 相同. 这里反直觉的地方是 SQLite 的 ‘localtime’ 与 ‘utc’ 两个修饰符的命名. 典型的错误是最后没有加 utc 修饰符.

官方文档 这样写道

The “localtime” modifier assumes the time-value to its left is in Universal Coordinated Time (UTC) and adjusts that time value so that it is in localtime.

The “utc” modifier is the opposite of “localtime”. “utc” assumes that the time-value to its left is in the local timezone and adjusts that time-value to be in UTC.

在 prompt 中提供这一段后, 就有相当一部分 LLM 能做对了.

SQLite 时间函数的修饰符从左往右依次叠加. 其中 localtime 实际上就是单纯 +8 小时, 而 utc 则是 -8 小时.

这些修饰符可以重复叠加, 比如

SELECT unixepoch('2025-05-05 04:00:00+08:00', 'localtime', 'localtime')

会得到 ‘2025-05-05 20:00:00+08:00’.

第二题

我用了和 4o 不同的实现

SELECT DATE('2025-05-05', 'weekday 0', '+1 day', '-14 days')

这里的坑在于如果左边的日期就是 weekday N, 则这个修饰符不做事情. 用 weekday 修饰符的 LLM 都栽了, 而且给答错的 LLM 提供下面的信息后依然会答错.

The “weekday” modifier advances the date forward, if necessary, to the next date where the weekday number is N. Sunday is 0, Monday is 1, and so forth. If the date is already on the desired weekday, the “weekday” modifier leaves the date unchanged.

典型错误回答是

DATE('2025-05-05', 'weekday 1', '-14 days')

LoRA 变体

作者 Shiina
2024年8月18日 08:00

LoRA

众所周知了, 略 (可以参考 这里).

We hypothesize that the change in weights during model adaptation also has a low “intrinsic rank”.

We limit our study to only adapting the attention weights for downstream tasks and freeze the MLP modules

QLoRA paper: “We find that the most critical LoRA hyperparameter is how many LoRA adapters are used in total and that LoRA on all linear transformer block layers is required to match full finetuning performance.”

初始化时 A 或 B 其中一个为零保证加了 AB 之后一开始的输出和原输出相同, 另一个非零保证优化过程中梯度不会恒为零.

注意 LoRA 并不省计算量, 只是大幅度节省了优化器需要存的参数, 可参考 这里这里.

GaLore

Gradient Low-Rank Projection (GaLore), a training strategy that allows full-parameter learning but is more memory efficient than common low-rank adaptation methods such as LoRA.

Our key idea is to leverage the slow changing low-rank structure of the gradient of the weight matrix, rather than trying to approximate the weight matrix itself as low rank.

微调和预训练都可以用. 问题是全量微调的话多个任务不方便部署吧.

for weight in model.parameters():
    grad = weight.grad
    # original space -> compact space
    lor_grad = project(grad)
    # update by Adam, Adafactor, etc.
    lor_update = update(lor_grad)
    # compact space -> original space
    update = project_back(lor_update)
    weight.data += update

At time step $t$, $G_t \in \mathbb R^{m\times n}$ is the negative gradient matrix of weight $W_t$. The regular update is

\[W_T = W_0 + \eta \sum_{t=0}^{T-1}\tilde G_t = W_0 + \eta \sum_{t=0}^{T-1}\rho_t(G_t),\]

where $eta$ is the learning rate, and $\rho_t$ is an entry-wise stateful gradient regularizer (e.g., Adam).

In GaLore, the $\tilde G_t$ in update becomes

\[\tilde G_t = P_t \rho_t(P_t'G_tQ_t)Q_t',\]

where $P_t \in \mathbb R^{m\times r}$ and $Q_t \in \mathbb R^{n\times r}$. They are derived from SVD:

\[\begin{align*} G_t &= USV' \approx \sum_{i=1}^r s_i u_i v_i', \\ P_t &= (u_1, \dots, u_r) , \quad Q_t = (v_1, \dots, v_r). \end{align*}\]

另外可参考

LoRA+

LoRA 中 B 的学习率应该大于 A.

简单易用.

DoRA

Our intuitions are two-fold. Firstly, we believe that limiting LoRA to concentrate exclusively on directional adaptation while also allowing the magnitude component to be tunable simplifies the task compared to the original approach, decomposition where LoRA is required to learn adjustments in both magnitude and direction. Secondly, the process of optimizing directional updates is made more stable through weight decomposition, which we delve into more thoroughly in Section.4.2.

第一点感觉没什么道理, 第二点还没仔细看过.

其他还有些很无聊的变种, 就略了.

LLM-based Text2SQL

作者 Shiina
2023年12月25日 08:00

Gao, D., Wang, H., Li, Y., Sun, X., Qian, Y., Ding, B., & Zhou, J. (2023). Text-to-sql empowered by large language models: A benchmark evaluation. arXiv preprint arXiv:2308.15363.

个人总结: 一篇 LLM 在 Text2SQL 数据集上的 prompt engineering 的实验报告. 在文中评测的两个数据集中效果是开源方案中最好的. 提出的 prompt 方案 DAIL-SQL 融合了现有的几种 RAG 方法.

数据集

  • Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students. It consists of 10,181 questions and 5,693 unique complex SQL queries on 200 databases with multiple tables covering 138 different domains.

实际上看给出的 Data Examples, 即使是 EXTRA HARD 的样例, 涉及的数据库和 SQL 相比实际都相当简单.

[Extra Hard] What is the average life expectancy in the countries where English is not the official language?

SELECT AVG(life_expectancy)
FROM country
WHERE name NOT IN 
   (SELECT T1.name
    FROM country AS T1 JOIN
    country_language AS T2
    ON T1.code = T2.country_code
    WHERE T2.language = "English"
      AND T2.is_official = "T")
  • BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation) contains over 12,751 unique question-SQL pairs, 95 big databases with a total size of 33.4 GB. It also covers more than 37 professional domains, such as blockchain, hockey, healthcare and education, etc.

评价指标

  • Execution Accuracy. 这个指标有很多种叫法. 生成的 SQL 执行结果是否与答案 SQL 结果相同.
  • Exact Set Match. 把 SQL 分解成若干子句, 每个子句再拆成词的集合. 弄成集合规避顺序问题, 比如 SELECT col1, col2SELECT col2, col1 等价. 详见 这里.
  • Valid Efficiency Score. 首先执行结果要符合答案, 其次评估效率.

Prompts

Question Representation

  • Basic Prompt. 给出相关表的 schemas, 接上 QA, 并以 A: SELECT 提示模型补全. 没有 instruction.
Table continents, columns = [ContId, Continent]
Table countries, columns = [CountryId, CountryName, Continent]
Q: How many continents are there?
A: SELECT
  • Text Representation Prompt. 在 Basic Prompt 基础上加上 instructions.
Given the following database schema:
continents: ContId, Continent
countries: CountryId, CountryName, Continent

Answer the following: How many continents are there?
SELECT
  • OpenAI Demostration Prompt. 当成 SQL 让模型补全, 把指示信息放在注释.
### Complete sqlite SQL query only and with no explanation
### SQLite SQL tables, with their properties:
#
# continents (ContId, Continent)
# countries (CountryId, CountryName, Continent)
#
### How many continents are there?
SELECT
  • Code Representation Prompt.
/* Given the following database schema: */
CREATE TABLE continents (
    ContId int primary key,
    Continent text,
    foreign key (ContId) references countries (Continent)
);

CREATE TABLE countries (
    CountryId int primary key,
    CountryName text,
    Continent int,
    foreign key (Continent) references continents (ContId)
);

/* Answer the following: How many continents are there? */
SELECT
  • Alpaca SFT Prompt.
Below is an instruction that describes a task, paired
with an input that provides further context. Write a
response that appropriately completes the request.

### Instruction:
Write a SQL query to answer the question "How many continents are there?"

### Input:
continents (ContId, Continent)
countries (CountryId, CountryName, Continent)

### Response:
SELECT

In-Context Learning

考虑 k-shot: 从训练集 (question-sql pairs) 中选 k 个放入 prompt.

  • Random.
  • Question Similarity Selection. 根据 question 相似度 kNN.
  • Masked Question Similarity Selection. 把 question 中的领域相关的表名, 列名, 值等 mask 掉, 再 kNN.
  • Query Similarity Selection. It employs a preliminary model to generate SQL query $s’$ using target question and database, where this generated $s’$ can be regarded as an approximation of target SQL query $s^\ast$. Then it encodes queries from examples into binary discrete syntax vectors according to their keywords. After that, it chooses $k$ examples by considering both similarity to the approximated query $s’$ and diversity among selected examples.

Example Organization

  • Full-Information Organization.
/* Given the following database schema: */
${DATABASE_SCHEMA}
/* Answer the following: How many authors are there? */
SELECT COUNT(*) FROM authors

/* Given the following database schema: */
${DATABASE_SCHEMA}
/* Answer the following: How many farms are there? */
SELECT COUNT(*) FROM farm

${TARGET_QUESTION}
  • SQL-Only Organization.
/* Some SQL examples are provided based on similar problems: */
SELECT COUNT(*) FROM authors

SELECT COUNT(*) FROM farm

${TARGET_QUESTION}

DAIL-SQL

这篇论文提出的方法, 缝合了上述所有方法.

  • 用 Code Representation Prompt 表示 question.
  • Selection. Consider both questions and queries to select candidates. Specifically, DAIL Selection first masks domain-specific words in both target question $q$ and example questions $q_i$ in the candidate set. It then ranks the candidate examples based on the Euclidean distance between the embeddings of masked $q$ and $q_i$. Simultaneously, it calculates the query similarity between the pre-predicted SQL query $s’$ and $s_i$ in the candidate set. Finally, the selection criterion prioritizes the sorted candidates by question similarity with a query similarity greater than a predefined threshold. In this way, the selected top $k$ examples have good similarity with both question and query.
  • Organization. Preserve the mapping information between questions and SQL queries and also improve the token efficiency. 缝合两种 organization 但是省些 token.
/* Some example questions and corresponding SQL queries are provided based on similar problems: */

/* Answer the following: How many authors are there? */
SELECT COUNT(*) FROM authors

/* Answer the following: How many farms are there? */
SELECT COUNT(*) FROM farm

${TARGET_QUESTION}

最后是用这套 prompt 在 GPT-4 上达到 sota. 微调则是对开源小 LLM 进行, 因为没钱调 GPT-4.

用 zero-shot prompt 微调, 发现微调后的 zero-shot 效果远远好于微调前的 few-shots 效果, 但是微调后用 few-shots 并没有提升还可能下降效果.

其他

❌
❌