返回列表
🧠 阿头学 · 💬 讨论题

Harness、记忆与上下文窗口的真实瓶颈

这篇的价值不在于提出一个完整方案,而是把 agent 产品里最容易被低估的瓶颈说准了:未来的竞争点会从模型调用,转向 harness 如何管理上下文、记忆和长期搜索。
打开原文 ↗

2026-04-15 原文链接 ↗
阅读简报
双语对照
完整翻译
原文
讨论归档

核心观点

  • Harness 是上下文调度器 模型真正能工作的地方仍然是上下文窗口,harness 的任务不是包装 UI,而是决定哪些对象在什么时候进入上下文。
  • 记忆会变成外部化资产 agent 的交互痕迹会快速累积,价值不只在单次对话,而在能否把 traces 提炼成可检索、可复用的长期记忆。
  • 搜索会成为基础能力 当 agent 长期运行,数据量会远超人工整理能力,系统必须能在大量历史经验里找到正确片段。
  • 开放生态很重要 作者强调数据归属,因为谁拥有 traces,谁就更有机会训练出更好的记忆和上下文管理系统。

跟我们的关联

  • 对 Neta 来说,这篇提醒我们不要只做单次生成体验,还要把用户、角色和作品的长期痕迹变成可召回的上下文资产。
  • 对 ATou 来说,真正值得沉淀的不是每次 agent 的输出,而是哪些判断、失败和偏好能被下一次任务稳定复用。
  • 对通用 agent 产品来说,harness 的护城河会来自上下文装配、检索质量和记忆治理,而不是单纯接入更强模型。

讨论引子

  • 哪些 traces 值得进入长期记忆,哪些只应该留在短期上下文里?
  • 如果 agent 能自己编辑外部记忆,什么机制能防止错误记忆越滚越大?
  • 一个好的 harness 应该优先优化上下文召回率,还是优先控制上下文污染?

这是一份仍在推进中的思考笔记,记录的是一些有意思的交汇点:我们如何使用和设计 harness,这对长时间尺度上不断累积的记忆意味着什么,以及我们无法逃开的搜索版“苦涩教训”。

这已经是 v30+ 版本了。HTML 图示帮助我反复打磨,也让我能通过聊天大致“看见”并修改这个心智模型。

Harness 与上下文片段: harness 的一个非常重要的职责,是在自身边界内高效且正确地路由数据,把它们送入上下文窗口的边界内,让计算得以发生。

上下文窗口是一件珍贵的东西。Harness 会决定如何填充、管理、编辑和组织它,让 agent 能够开展工作。每一个被加载的对象,都可以被看作一个上下文片段。它代表了用户和 harness 设计者的一个明确决定:在任意给定时刻,模型为了完成工作需要什么。

关于将对象外部化并加载进上下文窗口,许多想法都由 @a1zhang 通过 RLMs 开创并做了非常好的阐述。

体验记忆: 我们仍处在部署 agent 的极早期。agent 在每一次交互中都会产生大量数据。这有点像人类做事,并记住自己做过的事。

不过 agent 记忆有一个巨大优势:它可以在所有 agent 之间累积,而 agent 又很容易被 fork 和复制,这一点和人类不同。@dwarkesh_sp 很好地谈过人工系统的这个巨大优势。

记忆可以被视为一个外部化对象。Harness 的任务是做好带有上下文的检索,也就是从所有 agent 交互所累积的记忆中,拉取正确的数据。

搜索与苦涩教训: 当我们在真实世界中以年为尺度部署 agent,由这些 agent 产生的数据量将出现超指数级增长。我们应该希望: 1. 自己拥有这些数据。开放生态在这里很重要 2. 使用这些数据

这意味着我们必须在海量数据中进行搜索、提炼和组织。我们的大脑极其擅长这件事。它既能根据过往经验进行情境化使用,也能在足够有意识的练习中,把大体正确的东西写入记忆。

随着我们适应这种新的数据体量,现有的基础设施系统和算法将受到考验,并且经常会出问题。

一些开放问题: - 我们如何高效地把体验,也就是 Traces,提炼成更高层级的记忆原语,从而捕捉其中重要的部分?在超长时间跨度上,这件事又该如何完成?

  • 未来有多少会是即时搜索,又有多少会是被整合进模型权重的搜索?

  • 我们如何让模型更好地自主管理自己的上下文窗口?当我们递归地允许 agent 操作外部对象时,如何降低错误率?

我会继续扩展、修改和调整这些心智模型,但在我看来,这些是未来实用化设计 agent 时非常重要的一组问题。

Harness, Memory, Context Fragments, & the Bitter Lesson

这是一份仍在推进中的思考笔记,记录的是一些有意思的交汇点:我们如何使用和设计 harness,这对长时间尺度上不断累积的记忆意味着什么,以及我们无法逃开的搜索版“苦涩教训”。

this is a work in progress mental dump on interesting intersections between how we use and design a harness, implications for memory being accumulated over long timescales, and the search bitter lesson we can’t escape

这已经是 v30+ 版本了。HTML 图示帮助我反复打磨,也让我能通过聊天大致“看见”并修改这个心智模型。

this is v30+, HTML diagrams help me iteratively refine + chat to roughly “see” and alter the mental model

Harness 与上下文片段: harness 的一个非常重要的职责,是在自身边界内高效且正确地路由数据,把它们送入上下文窗口的边界内,让计算得以发生。

Harnesses & Context Fragments: a very important job of the harness is to efficiently & correctly route data within its boundaries into the context window boundary for computation to happen the context window is a precious artifact. Harnesses make decisions on how to populate, manage, edit, and organize it so agents can do work. Each loaded object can be thought of as a Context Fragment and represents an explicit decision by the user and harness designer of what needs a model needs to do work at any given time.

上下文窗口是一件珍贵的东西。Harness 会决定如何填充、管理、编辑和组织它,让 agent 能够开展工作。每一个被加载的对象,都可以被看作一个上下文片段。它代表了用户和 harness 设计者的一个明确决定:在任意给定时刻,模型为了完成工作需要什么。

many ideas on externalizing objects + loading into the context window are pioneered and very well described by @a1zhang with RLMs

关于将对象外部化并加载进上下文窗口,许多想法都由 @a1zhang 通过 RLMs 开创并做了非常好的阐述。

Experiential Memory: we’re in the very early days of deploying agents and agents produce massive amounts of data in every interaction they have. this is akin to humans doing things and remembering things they did.

体验记忆: 我们仍处在部署 agent 的极早期。agent 在每一次交互中都会产生大量数据。这有点像人类做事,并记住自己做过的事。

however agent memory has a massive advantage as it can be accumulated across all agents which are easily forked and duplicated (unlike humans). @dwarkesh_sp does a good talking about this massive benefit of artificial systems

不过 agent 记忆有一个巨大优势:它可以在所有 agent 之间累积,而 agent 又很容易被 fork 和复制,这一点和人类不同。@dwarkesh_sp 很好地谈过人工系统的这个巨大优势。

memory can be treated as an externalized object. the harness is tasked with doing good contextualized retrieval which means pulling in the right data from accumulated memories across all agent interactions

记忆可以被视为一个外部化对象。Harness 的任务是做好带有上下文的检索,也就是从所有 agent 交互所累积的记忆中,拉取正确的数据。

Search & The Bitter Lesson: As we deploy agents in our world over year timescales, there is going to be a hyper-exponential in the amount of data produced by those agents. We should want to: 1. Own that data for ourselves. Open ecosystems are important here 2. Use that data

搜索与苦涩教训: 当我们在真实世界中以年为尺度部署 agent,由这些 agent 产生的数据量将出现超指数级增长。我们应该希望: 1. 自己拥有这些数据。开放生态在这里很重要 2. 使用这些数据

This means that we’ll have to search over, distill, and organize massive amounts of data. Our brain is exceptional at doing this. Both contextually using prior experience and mostly committing the right stuff to memory with enough intentional practice.

这意味着我们必须在海量数据中进行搜索、提炼和组织。我们的大脑极其擅长这件事。它既能根据过往经验进行情境化使用,也能在足够有意识的练习中,把大体正确的东西写入记忆。

Our current infrastructure systems and algorithms will be put to the test and often break as we get used to this new data regime

随着我们适应这种新的数据体量,现有的基础设施系统和算法将受到考验,并且经常会出问题。

some open questions: - how do we efficiently distill experiences (Traces) into higher level memory primitives that capture the important parts? How do we do this over ultra long time horizons?

一些开放问题: - 我们如何高效地把体验,也就是 Traces,提炼成更高层级的记忆原语,从而捕捉其中重要的部分?在超长时间跨度上,这件事又该如何完成?

  • How much of the future is Search just-in-time vs Search that gets integrated into model weights?
  • 未来有多少会是即时搜索,又有多少会是被整合进模型权重的搜索?
  • How do we make models much better at self-managing their context window? How do we reduce error rates in recursively allowing agents to operate over external objects?
  • 我们如何让模型更好地自主管理自己的上下文窗口?当我们递归地允许 agent 操作外部对象时,如何降低错误率?

i’ll be expanding on, altering, and adjusting these mental models but these feel like an important subset to me on the future of designing agents practically

我会继续扩展、修改和调整这些心智模型,但在我看来,这些是未来实用化设计 agent 时非常重要的一组问题。

Harness, Memory, Context Fragments, & the Bitter Lesson

this is a work in progress mental dump on interesting intersections between how we use and design a harness, implications for memory being accumulated over long timescales, and the search bitter lesson we can’t escape

this is v30+, HTML diagrams help me iteratively refine + chat to roughly “see” and alter the mental model

Harnesses & Context Fragments: a very important job of the harness is to efficiently & correctly route data within its boundaries into the context window boundary for computation to happen the context window is a precious artifact. Harnesses make decisions on how to populate, manage, edit, and organize it so agents can do work. Each loaded object can be thought of as a Context Fragment and represents an explicit decision by the user and harness designer of what needs a model needs to do work at any given time.

many ideas on externalizing objects + loading into the context window are pioneered and very well described by @a1zhang with RLMs

Experiential Memory: we’re in the very early days of deploying agents and agents produce massive amounts of data in every interaction they have. this is akin to humans doing things and remembering things they did.

however agent memory has a massive advantage as it can be accumulated across all agents which are easily forked and duplicated (unlike humans). @dwarkesh_sp does a good talking about this massive benefit of artificial systems

memory can be treated as an externalized object. the harness is tasked with doing good contextualized retrieval which means pulling in the right data from accumulated memories across all agent interactions

Search & The Bitter Lesson: As we deploy agents in our world over year timescales, there is going to be a hyper-exponential in the amount of data produced by those agents. We should want to: 1. Own that data for ourselves. Open ecosystems are important here 2. Use that data

This means that we’ll have to search over, distill, and organize massive amounts of data. Our brain is exceptional at doing this. Both contextually using prior experience and mostly committing the right stuff to memory with enough intentional practice.

Our current infrastructure systems and algorithms will be put to the test and often break as we get used to this new data regime

some open questions: - how do we efficiently distill experiences (Traces) into higher level memory primitives that capture the important parts? How do we do this over ultra long time horizons?

  • How much of the future is Search just-in-time vs Search that gets integrated into model weights?

  • How do we make models much better at self-managing their context window? How do we reduce error rates in recursively allowing agents to operate over external objects?

i’ll be expanding on, altering, and adjusting these mental models but these feel like an important subset to me on the future of designing agents practically

📋 讨论归档

讨论进行中…