OpenClaw(Clawdbot) 是如何“记住一切”的

井九

发布于 2026-01-31 08:53:34

2.6K0

Clawdbot 是如何“记住一切”的

Clawdbot 是一个开源的个人 AI 助手（MIT License），由 Peter Steinberger 创建。截至本文撰写时，它在 GitHub 上已经获得了 32,600+ Star。与 ChatGPT、Claude 这类运行在云端的 AI 不同，Clawdbot 完全运行在本地，并且可以接入你已经在使用的聊天平台，比如 Discord、WhatsApp、Telegram 等。

短短几天名字已经从Clawdbot变Moltbot，今天又变成了OpenClaw

不使用云端、公司控制的记忆系统，而是将一切保存在本地，把上下文和能力的完全所有权交还给用户。

上下文是如何构建的（How Context is Built）

在讨论记忆之前，先看看 模型在每一次请求中实际能看到什么内容：

[0] System Prompt（系统提示词：静态 + 条件指令）
[1] Project Context（项目上下文：启动文件，如 AGENTS.md、SOUL.md 等）
[2] Conversation History（对话历史：消息、工具调用、压缩摘要）
[3] Current Message（当前用户输入）

System Prompt 定义了 Agent 的能力和可用工具。与记忆最相关的是 Project Context —— 这些是 用户可编辑的 Markdown 文件，会被注入到每一次请求中。

这些文件和 memory 文件 一起存放在 agent 的工作区中，使整个 Agent 的配置 完全透明、可编辑。

Context vs Memory：上下文 ≠ 记忆

理解 Context（上下文） 和 Memory（记忆） 的区别，是理解 Clawdbot 的关键。

Context（上下文）

模型在单次请求中可见的一切：

Context = System Prompt + Conversation History + Tool Results + Attachments

Context 的特点：

短暂（Ephemeral）：只存在于当前请求
有上限（Bounded）：受模型上下文窗口限制（如 200K tokens）
昂贵（Expensive）：每个 token 都影响 API 成本和响应速度

Memory（记忆）

存储在磁盘上的长期信息：

Memory = MEMORY.md + memory/*.md + Session Transcripts

Memory 的特点：

持久（Persistent）：跨重启、跨天、跨月
无上限（Unbounded）：可无限增长
便宜（Cheap）：存储不消耗 API 成本
可搜索（Searchable）：支持语义检索

记忆访问工具（The Memory Tools）

Agent 通过两个专用工具访问记忆。

1️⃣ memory_search

用途：在所有记忆文件中查找相关内容（强制召回步骤）

{
  "name": "memory_search",
  "description": "Mandatory recall step: semantically search MEMORY.md + memory/*.md before answering questions about prior work, decisions, dates, people, preferences, or todos",
  "parameters": {
    "query": "What did we decide about the API?",
    "maxResults": 6,
    "minScore": 0.35
  }
}

返回结果：

{
  "results": [
    {
      "path": "memory/2026-01-20.md",
      "startLine": 45,
      "endLine": 52,
      "score": 0.87,
      "snippet": "## API Discussion\nDecided to use REST over GraphQL for simplicity...",
      "source": "memory"
    }
  ],
  "provider": "openai",
  "model": "text-embedding-3-small"
}

2️⃣ memory_get

用途：在 memory_search 找到内容后，精确读取指定行

{
  "name": "memory_get",
  "description": "Read specific lines from a memory file after memory_search",
  "parameters": {
    "path": "memory/2026-01-20.md",
    "from": 45,
    "lines": 15
  }
}

返回结果：

{
  "path": "memory/2026-01-20.md",
  "text": "## API Discussion\n\nMet with the team to discuss API architecture.\n\n### Decision\nWe chose REST over GraphQL for the following reasons:\n1. Simpler to implement\n2. Better caching\n3. Team familiarity\n\n### Endpoints\n- GET /users\n- POST /auth/login\n- GET /projects/:id"
}

写入记忆（Writing to Memory）

Clawdbot 没有专门的 memory_write 工具。

Agent 使用的是 通用的文件 write / edit 工具，和操作任何文件完全一致。因为记忆本质上就是 Markdown 文件，你也可以 手动编辑（系统会自动重新索引）。

写入位置的策略由 AGENTS.md 中的提示词驱动。

此外，在以下场景也会自动写入：

预压缩（pre-compaction）内存刷新
会话结束（session end）

记忆的存储结构（Memory Storage）

Clawdbot 的核心理念是：

记忆 = Agent 工作区中的纯 Markdown 文件

双层记忆系统（Two-Layer Memory System）

~/clawd/
├── MEMORY.md              # 第二层：长期精选记忆
└── memory/
    ├── 2026-01-26.md      # 第一层：当天日志
    ├── 2026-01-25.md
    ├── 2026-01-24.md
    └── ...

Layer 1：每日日志（Daily Logs）

memory/YYYY-MM-DD.md 只追加（append-only） 的当天记录。

# 2026-01-26

## 10:30 AM - API Discussion
Discussed REST vs GraphQL with user. Decision: use REST for simplicity.
Key endpoints: /users, /auth, /projects.

## 2:15 PM - Deployment
Deployed v2.3.0 to production. No issues.

## 4:00 PM - User Preference
User mentioned they prefer TypeScript over JavaScript.

Layer 2：长期记忆（MEMORY.md）

用于存储 重要、稳定、长期有价值的信息。

# Long-term Memory

## User Preferences
- Prefers TypeScript over JavaScript
- Likes concise explanations
- Working on project "Acme Dashboard"

## Important Decisions
- 2026-01-15: Chose PostgreSQL for database
- 2026-01-20: Adopted REST over GraphQL
- 2026-01-26: Using Tailwind CSS for styling

## Key Contacts
- Alice (alice@acme.com) - Design lead
- Bob (bob@acme.com) - Backend engineer

Agent 如何“知道”要读记忆？

在 AGENTS.md 中有明确指令：

## Every Session

Before doing anything else:
1. Read SOUL.md - this is who you are
2. Read USER.md - this is who you are helping
3. Read memory/YYYY-MM-DD.md (today and yesterday) for recent context
4. If in MAIN SESSION (direct chat with your human), also read MEMORY.md

Don't ask permission, just do it.

记忆是如何被索引的（How Memory Gets Indexed）

当一个记忆文件被保存时，系统后台流程如下：

1. 文件保存
   ~/clawd/memory/2026-01-26.md
        ↓
2. 文件监听
   Chokidar 监听 MEMORY.md + memory/**/*.md
   1.5 秒防抖
        ↓
3. 分块（Chunking）
   ~400 tokens / chunk，80 tokens 重叠
        ↓
4. 向量化（Embedding）
   每个 chunk → embedding provider → 向量
        ↓
5. 存储
   ~/.clawdbot/memory/<agentId>.sqlite

SQLite 中包含：

chunks：文本块元数据
chunks_vec：向量（sqlite-vec）
chunks_fts：全文索引（FTS5）
embedding_cache：避免重复 embedding

👉 无需外部向量数据库，一切在 SQLite 内完成

记忆是如何搜索的（Hybrid Search）

Clawdbot 同时运行两种搜索：

向量搜索（语义）
BM25 关键词搜索

最终得分：

finalScore = (0.7 * vectorScore) + (0.3 * textScore)

为什么是 70 / 30？

语义相似度是主要信号
关键词搜索能捕捉 名称 / ID / 日期 等精确项

多 Agent 记忆隔离（Multi-Agent Memory）

每个 Agent 拥有 完全独立的记忆空间：

~/.clawdbot/memory/
├── main.sqlite
└── work.sqlite

~/clawd/        # main agent
~/clawd-work/   # work agent

Markdown 是 事实源（source of truth）
SQLite 是 派生索引
默认 不能跨 Agent 访问记忆

这非常适合区分 个人 Agent / 工作 Agent。

对话压缩（Compaction）

当上下文接近模型上限时，Clawdbot 会：

总结旧对话
保留最近消息
将摘要写入磁盘

压缩结果 会持久化，不会丢失。

预压缩记忆刷新（Memory Flush）

为防止压缩导致信息丢失：

在接近上限前
触发 静默 memory flush
把重要事实写入记忆文件
用户无感知

{
  agents: {
    defaults: {
      compaction: {
        reserveTokensFloor: 20000,
        memoryFlush: {
          enabled: true,
          softThresholdTokens: 4000,
          systemPrompt: "Session nearing compaction. Store durable memories now.",
          prompt: "Write lasting notes to memory/YYYY-MM-DD.md; reply NO_REPLY if nothing to store."
        }
      }
    }
  }
}