普通视图

昨天以前Finisky Garden

Finisky Garden
The Hivemind of Language Modelsfinisky
Ask GPT-4 to recommend an underrated sci-fi film. It says Moon. Ask Claude the same question — also Moon. Try Gemini — Moon again. A NeurIPS 2025 Best Paper, <a href="https://arxiv.org/abs/2510.22954">Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)</a>, quantifies this phenomenon at scale: different language models give strikingly similar answers to open-ended questions.&
2026年4月17日 08:36

The Hivemind of Language Models

作者 finisky

2026年4月17日 08:36

Ask GPT-4 to recommend an underrated sci-fi film. It says Moon. Ask Claude the same question — also Moon. Try Gemini — Moon again. A NeurIPS 2025 Best Paper, <a href="https://arxiv.org/abs/2510.22954">Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)</a>, quantifies this phenomenon at scale: different language models give strikingly similar answers to open-ended questions.

From RAG to Knowledge Compilation

Finisky Garden

作者 finisky

2026年4月16日 11:43

RAG re-retrieves, re-assembles, and re-reasons on every query. Ask something that requires synthesizing five documents and the model has to find all five, stitch them together, and derive the answer from scratch. Ask ten times, retrieve ten times. Nothing accumulates. Karpathy recently posted a gist called LLM Wiki proposing a different approach: instead of retrieving at query time, have the LLM pre-compile knowledge into a structured wiki and query the compiled result.

从RAG到知识编译

Finisky Garden

作者 finisky

2026年4月16日 08:39

RAG 的工作方式是每次提问都重新检索、重新拼接、重新推理。问一个需要综合五篇文档才能回答的问题，模型每次都得从头找到这五篇，拼起来，再给你答案。问十次，找十次。什么都没积累下来。 Karpathy 前两天发了一个叫 LLM Wiki 的 gist，提了一个不同的思路：别让模型每次现场检索了，让它把知识预先编译成一个结构化的 wiki，查询的时候直接查编译好的结果。

Theoretical Ceiling of Vector Retrieval

Finisky Garden

作者 finisky

2026年4月15日 17:35

Dense retrieval has become the default first stage in RAG pipelines. Encode documents into vectors, encode queries into vectors, compute cosine similarity, done. But a basic question rarely gets asked: for a d-dimensional embedding, how many distinct top-k retrieval results can it actually represent? An ICLR 2026 paper from Google DeepMind and JHU, "On the Theoretical Limitations of Embedding-Based Retrieval", gives a mathematical answer: not enough. Not even close.

向量检索的理论天花板

Finisky Garden

作者 finisky

2026年4月15日 17:31

向量检索（dense retrieval）这几年几乎成了 RAG 的标配。把文档编码成一个向量，查询也编码成一个向量，算个余弦相似度就能检索。但一个基本问题很少被认真讨论过：一个 d 维向量，到底能表示多少种不同的 top-k 检索结果？ ICLR 2026 这篇来自 Google DeepMind 和 JHU 的论文 "On the Theoretical Limitations of Embedding-Based Retrieval" 给出了一个数学上的回答：不够。而且远远不够。

Unexpected Perks of Talking to AI

Finisky Garden

作者 finisky

2026年4月14日 08:52

Justin Sun (the crypto guy) recently dropped a hot take: "It's 2026 already — if you can talk to AI, stop talking to humans." He also said something about deleting contacts born before 1990 and WeChat being for old people. Classic Justin Sun. Take it with a grain of salt. But strip away the absurd parts, and "talk to AI more" is something I actually agree with as a heavy user. Here's why, and where it falls apart.

How Claude Dreams: Background Memory Defragmentation

Finisky Garden

作者 finisky

2026年4月13日 19:40

There's a module inside Claude Code called autoDream. Its prompt title reads "Dream: Memory Consolidation." This isn't a metaphor. Claude Code actually spins up a background sub-agent that reviews transcripts from past sessions, consolidates scattered memories — merging, deduplicating, correcting — and writes them back to disk. The whole thing is invisible unless you dig into the background task list.

Claude做梦机制：后台记忆碎片整理

Finisky Garden

作者 finisky

2026年4月13日 19:35

Claude Code 内部有一个叫 autoDream 的模块。它的 Prompt 标题是"Dream: Memory Consolidation"。 这不是什么隐喻。Claude Code 确实会在后台启动一个子代理，回顾过去多个会话的记录，把零散的记忆整理、去重、纠错，然后写回磁盘。整个过程你看不到，除非刻意去翻后台任务列表。

AI and Employment: A 200-Year-Old Debate

Finisky Garden

作者 finisky

2026年4月11日 10:45

Tech job boards in 2025 are schizophrenic. Traditional software engineering roles are shrinking. "AI"-prefixed positions are expanding. Same company, same quarter — cutting junior devs and project managers on one side, opening Agent orchestration engineers and AI application architects on the other.

AI与就业：一场200年没吵完的架

Finisky Garden

作者 finisky

2026年4月11日 00:45

2025 年科技行业的招聘页面很分裂：传统软件工程师岗位在缩，带"AI"前缀的职位在涨。同一家公司，左手砍初中级开发和项目管理，右手开 Agent 编排工程师和 AI 应用架构师。

Three Evolutions of Agent Engineering

Finisky Garden

作者 finisky

2026年4月9日 00:19

Last June, Shopify CEO Tobi Lutke posted that he preferred "context engineering" over "prompt engineering." Karpathy retweeted with a +1. Simon Willison wrote a blog post saying the term might actually stick. Phil Schmid published a full definition. Half the AI community switched terminology within a week. By early 2026, Phil Schmid introduced another term: Agent Harness. It didn't generate the same buzz, but anyone building Coding Agents quietly nodded along.

Agent 工程的三次进化

Finisky Garden

作者 finisky

2026年4月9日 00:15

去年 6 月 Shopify CEO Tobi Lutke 发了条推，说他觉得 context engineering 比 prompt engineering 好。Karpathy 转发 +1。Simon Willison 写了篇博客说这个词可能真能立住。Phil Schmid 做了完整定义。半个 AI 圈在一周之内集体换了术语。 到了 2026 年初，Phil Schmid 又抛了一个新词：Agent Harness。这次没有上一轮那么热闹，但写 Coding Agent 的人几乎都默默点了头。

Context Management in Claude Code vs OpenClaw

Finisky Garden

作者 finisky

2026年4月7日 23:49

After OpenClaw crossed 350K stars, a narrative started forming in the community: since both run on Opus 4.6 under the hood, the open-source option should be on par with Claude Code. Anyone who has actually used both probably shares the same observation — in long sessions, OpenClaw starts losing context, forgetting files it already read, redoing work it already did. Claude Code does too, but noticeably later, and it recovers much better. Same model, different experience. Why?

Claude Code 和 OpenClaw 的上下文管理对比

Finisky Garden

作者 finisky

2026年4月7日 22:15

OpenClaw 拿下 35 万 Star 之后，社区开始出现一种论调：底层都是 Opus 4.6，开源方案应该能对标 Claude Code。实际用过两边的人大概都有同一个感受，长会话跑到后半段，OpenClaw 开始丢上下文，忘记之前读过的文件，重复做已经做过的事。Claude Code 也会，但明显晚得多，而且恢复能力强很多。 模型一样，体验不一样。差在哪？

Foundation Models Plateau, Applications Take Off

Finisky Garden

作者 finisky

2026年4月7日 00:21

Cursor's parent company Anysphere has about 150 employees. In November 2025, its <a href="https://e.vnexpress.net/news/tech/personalities/4-mit-graduates-who-built-the-popular-ai-coding-tool-cursor-become-billionaires-4965462.html">ARR crossed $1 billion</a>. OpenAI, as of early 2026, has <a href="https://www.ft.com/content/7ffea5b4-e8bc-47cd-adb4-257f84c8028b">4,500 employees</a>. Its 2025 revenue was $13.1 billion, but according to Fortune, it <a href="https://fortune.com/2025/11/12/openai-cash-burn-rate-annual-losses-2028-profitable-2030-financial-documents/">lost roughly $9 billion</a> and doesn't expect to turn profitable until 2028. An application company that trains zero models is outproducing, per capita, the company that trains them. This is the most telling signal in AI for 2025.

基模到顶，应用起飞

Finisky Garden

作者 finisky

2026年4月7日 00:16

Cursor 的母公司 Anysphere 大概 150 人，2025 年 11 月<a href="https://e.vnexpress.net/news/tech/personalities/4-mit-graduates-who-built-the-popular-ai-coding-tool-cursor-become-billionaires-4965462.html">年收入突破 10 亿美元</a>。OpenAI 到 2026 年初有 <a href="https://www.ft.com/content/7ffea5b4-e8bc-47cd-adb4-257f84c8028b">4500 名员工</a>，2025 年收入 131 亿美元，但据 Fortune 报道，<a href="https://fortune.com/2025/11/12/openai-cash-burn-rate-annual-losses-2028-profitable-2030-financial-documents/">亏损约 90 亿美元</a>，而且预计一直亏到 2028 年。 一个不训练任何模型的应用公司，人均产出碾压了训练模型的公司。这组数字是 2025 年 AI 行业最值得琢磨的信号。

Claude Code 的工具按需加载

Finisky Garden

作者 finisky

2026年4月5日 21:33

用 Claude Code 写代码的时候，你大概不会注意到一件事：它注册了超过 40 个工具，但你让它读个文件、改几行代码，它只用到三四个。剩下那三十多个工具的定义，每个大约 500 个 token，全塞进上下文就是一万多 token 的固定开销。你只想改一行 CSS，却要为 WebSearch、NotebookEdit、CronCreate 这些完全用不到的工具买单。

How OpenClaw Hit 350K Stars in 4 Months

Finisky Garden

作者 finisky

2026年4月6日 00:16

In late November 2025, an open-source project called OpenClaw went live on GitHub. Four and a half months later, it had 350K stars, 70K forks, 81 releases, and sponsorships from OpenAI, NVIDIA, and Vercel. For comparison: Open WebUI took two and a half years to reach 130K stars; NextChat took three years to hit 88K. Growth like OpenClaw's is rare in GitHub's history. It isn't a new model, a training framework, or even a "technical breakthrough" in the traditional sense. It's a personal AI assistant that runs on your own machine and talks to you through the chat apps you already use: WhatsApp, Telegram, Slack, Discord, WeChat, Feishu, iMessage, Matrix, and over 25 platforms in total, all connected to a single backend. Why did it break out of the developer bubble?

OpenClaw 为什么能 4 个月拿下 35 万 Star

Finisky Garden

作者 finisky

2026年4月6日 00:12

2025 年 11 月底，一个叫 OpenClaw 的开源项目在 GitHub 上线。4 个半月后，它有 35 万 Star，7 万 Fork，81 个发布版本，OpenAI、NVIDIA、Vercel 做它的赞助商。同期对比：Open WebUI 用了两年半才到 13 万，NextChat 三年到 8.8 万。OpenClaw 的增速在 GitHub 历史上都不多见。 它不是一个新模型，不是一个训练框架，甚至不是一个传统意义上的"技术突破"。它是一个个人 AI 助手，跑在你自己的机器上，通过你已经在用的聊天工具跟你对话。WhatsApp、Telegram、Slack、Discord、微信、飞书、iMessage、Matrix，25 个以上的平台，同时接入，一个后台。 这篇聊聊它为什么能出圈。