普通视图

发现新文章，点击刷新页面。

昨天以前首页

Tony Bai
C++ 社区内部大讨论：新特性到底是“生产力革命”，还是“叠加的复杂性”？bigwhite
本文永久链接 – https://tonybai.com/2026/04/15/cpp-community-debate-productivity-revolution-vs-complexity 大家好，我是Tony Bai。如果你把编程语言比作工具，Go 是一把极简的手术刀，精准且克制；Rust 是一套带智能传感器的外骨骼装甲，严苛且安全。而 C++ 呢？它更像是一把在过去四十年里不断被加挂零件的、超重型复合瑞士军刀。最开始，它只有刀片和叉子；后来，它加了锯子、剪刀和钳子；再后来，它甚至被塞进了一套显微镜和一支激光笔。在开发者眼里，它是能解决世间一切难题的万能神兵，但也是一个重到让你拿不稳、甚至随时可能切到自己手指的“庞然大物”。但就在前几天，r/cpp 这个拥有近 10 万 C++开发者的顶级社区里，一篇名为《现代 C++ 是让我们更高效了… 还是更复杂了？》的帖子，引发了一场深度大讨论。发帖人发出了灵魂拷问： “C++20/23 给我们带来了 Ranges、协程（Coroutines）、Concepts、Modules……这些新特性真的很酷，我也在用。但我总在想，
2026年4月15日 08:27

C++ 社区内部大讨论：新特性到底是“生产力革命”，还是“叠加的复杂性”？

Tony Bai

作者 bigwhite

2026年4月15日 08:27

本文永久链接 – https://tonybai.com/2026/04/15/cpp-community-debate-productivity-revolution-vs-complexity

大家好，我是Tony Bai。

如果你把编程语言比作工具，Go 是一把极简的手术刀，精准且克制；Rust 是一套带智能传感器的外骨骼装甲，严苛且安全。

而 C++ 呢？它更像是一把在过去四十年里不断被加挂零件的、超重型复合瑞士军刀。

最开始，它只有刀片和叉子；后来，它加了锯子、剪刀和钳子；再后来，它甚至被塞进了一套显微镜和一支激光笔。在开发者眼里，它是能解决世间一切难题的万能神兵，但也是一个重到让你拿不稳、甚至随时可能切到自己手指的“庞然大物”。

但就在前几天，r/cpp 这个拥有近 10 万 C++开发者的顶级社区里，一篇名为《现代 C++ 是让我们更高效了… 还是更复杂了？》的帖子，引发了一场深度大讨论。

发帖人发出了灵魂拷问：

“C++20/23 给我们带来了 Ranges、协程（Coroutines）、Concepts、Modules……这些新特性真的很酷，我也在用。但我总在想，我们是不是在用这些东西吓跑新人的同时，眼睁睁地看着老代码库永远冻结在 C++98？现代 C++ 对生产力来说，到底是一场革命，还是在原本已经足够复杂的巨兽身上，又叠加了一层复杂性？”

这篇帖子，精准地戳中了每一个 C++ 开发者心中最深的困惑。短短一天，就吸引了上百条充满血泪与思考的评论。

今天，我们就来复盘这场顶级的社区大讨论，看看这柄“瑞士军刀”在疯狂“堆料”的背后，到底藏着怎样的挣扎、分裂与反思。

分裂的社区：C++98 遗老、C++17 中坚与 C++23 先锋的“平行宇宙”

在这场大讨论中，我仿佛看到了 C++ 社区三个泾渭分明的平行宇宙。

宇宙一：永远的 C++98/11 ——“能跑就行，别动！”

评论区里，点赞最高的一派观点，充满了对“存量代码”的敬畏与无奈。

一位开发者吐槽道：

“我在太多项目里因为各种原因被迫使用旧标准，以至于我已经懒得去关心最新的特性了。我感觉很多专业场景就是这样：我们用着‘穴居人 C++’，因为那玩意儿安全（指熟悉）、方便。”

另一位开发者更是直接引用了 Matt Godbolt 的名言：“向后兼容性才是 C++ 的超能力。”

“别想着重构了，那只会破坏一切。跑了 20 年没 Bug 的生产代码是无价之宝，别碰它！”

更有甚者，因为芯片厂商的编译器只支持 C++89，或者因为“法律原因”，一个项目被迫在一个 3 年前的工具链上锁死 7 年。

在这个宇宙里，C++20 的新特性，对他们来说都像火星科技一样遥远。

宇宙二：拥抱 C++20/23 ——“旦用难回，太香了！”

与“遗老派”形成鲜明对比的，是那些已经吃上新标准红利的“先锋派”。

有开发者激动地表示：

“自从我开始用协程（Coroutines）写网络 IO 代码，我再也回不去以前那种回调地狱了！”

另一位则对 C++23 的 std::println 赞不绝口：

“我离不开 C++23，完全是因为 println。我不知道我还在用 23 的什么其他特性，但光这一个就太棒了。”

对于这部分开发者来说，现代 C++ 的每一个新特性，都是一次生产力的解放。他们就像一群拿到了新玩具的孩子，兴奋地探索着 Ranges 的组合魔法和 Concepts 带来的清爽报错。

宇宙三：爱恨交织的“中间派”——“一半是天堂，一半是地狱”

这或许是最大多数 C++ 开发者的真实写照。

正如帖子作者所言，新特性确实很酷，但它们也带来了巨大的认知负荷和决策成本。

一个开发者的评论获得了 82 个高赞：

“我们大多数人只用了 C++ 语言特性的一小部分。这就像一个‘鸡生蛋、蛋生鸡’的问题：这里有个新特性，但我不知道该怎么用、为什么要用；或者，我代码里有个痛点，可能能用新特性解决，但我不知道该用哪个。”

这种“选择的困境”，正是 C++ “自由”的代价。

底层矛盾：C++ 的“集市”哲学 vs 团队的“教堂”困境

为什么 C++ 会演变成今天这样？

评论区里的一位开发者给出了一个极其精妙的比喻：“集市（Bazaar）”。

“我绝对热爱 C++ 的一点是：它有一个特性集市，你可以挑选你认为适合你项目的工具。如果你看其他语言，比如 Java 要求万物皆对象，Haskell 要求万物皆函数。C++ 给了你面向对象，你讨厌它？没问题，不用就行。你喜欢函数式？C++ 也支持。”

这种“万物皆可选”的自由，是 C++ 最大的魅力，当然也是它最大的诅咒。

因为在一个团队里，当每个人都从“集市”上拿回了自己最喜欢的锤子时，整个项目就会变成一个风格迥异的“建筑工地”。

原帖作者自己也承认：

“自由是真实的，但这也意味着两个 C++ 代码库可能看起来像两种完全不同的语言。”

当一个文件里还在用裸指针和手动内存管理，而另一个文件里已经用上了 std::unique_ptr 和 std::span；当一部分团队在用 boost::asio 写回调，而另一部分团队在用 C++20 的协程……

Code Review 就变成了一场噩梦。

反思：“技术债”还是“护城河”？

这场大讨论的背后，其实隐藏着两个更深层次的软件工程哲学问题。

问题一：新特性是“锦上添花”，还是“非用不可”？

很多 C++ 老兵认为，现代 C++ 增加的很多特性，比如 Ranges 和 Coroutines，其实早在几十年前的 LISP 语言里就已经被证明是伟大的思想。C++ 只是在用一种极其缓慢、极其复杂的方式，在“偿还”几十年前欠下的“技术债”。

但另一些人认为，C++ 的伟大恰恰在于，它能用“零成本抽象（Zero-cost Abstraction）”的硬核方式，将这些高级思想，落地到对性能要求极致的生产环境中。

问题二：复杂性是“敌人”，还是“朋友”？

一位开发者的评论极具辩证思维：

“这（新特性）既是好事，也是坏事。学习的门槛确实在不断提高。但这些工具是实实在在有用的，它们让你能用更干净、更安全、更高效的方式表达代码。”

当 Go在极力做“减法”，试图降低开发者的心智负担时，C++ 却似乎在坚定地走着另一条路：它信任开发者是专家，它把所有的选择权和复杂性都交给你，让你自己去构建属于你的“最佳子集”。

这就像驾驶一架拥有几百个仪表盘的航天飞机。对于新手来说是灾难，但对于顶尖的飞行员来说，每一个按钮都意味着更精准的控制力。

出路何在？：拥抱“渐进式现代化”

在这场看似无解的“内部大讨论”中，我们依然能找到一条充满智慧的中间路线。

有人分享了一个极具参考价值的真实案例：

他成功地在一个庞大的 C++98 代码库中，引入了一个用 C++17 编写的新功能模块。他没有去重构任何老代码，只是简单地升级了编译器和构建脚本。结果：新特性带来了性能的提升和开发效率的飞跃，而老代码依然稳定运行。

这或许就是现代 C++ 正确的打开方式：不要试图用新标准去“革命”旧代码，而是在写新代码时，大胆地、有选择地拥抱新特性。

让 C++98 的归 C++98，让 C++23 的归 C++23。在一个代码库中，允许不同时代的“方言”共存，用新增的模块去逐步“稀释”历史的包袱。

小结：一场关于“自由”的伟大实验

C++ 的这场大讨论，没有赢家。

它只是再次向我们证明了这门语言的“独一无二”：它是一门民主的语言。它给了你选择一切的自由，也要求你为自己的选择承担一切后果。

用一位开发者的话来说：

“Rust 强加给你它的观点；而 C++ 要求你有你自己的观点。这就像专制与民主的区别。大多数时候，民主只是一个被猴子笼子管理的、组织混乱的马戏团。但我更喜欢民主。”

或许，对于我们这些已经习惯了 Go 和 Rust 那种“带你走”模式的开发者来说，偶尔回头看看 C++ 这个充满“混沌与活力”的古老集市，会让我们对“软件工程”这门手艺，有更深刻的理解。

资料链接：https://www.reddit.com/r/cpp/comments/1sihs1w/is_modern_c_actually_making_us_more_productive_or

今日互动探讨：

在你的技术生涯中，你是否也曾被困在某个古老的“技术版本”里动弹不得？对于 C++ 这种“万物皆可选”的自由哲学，你是向往，还是恐惧？

欢迎在评论区分享你的看法！

还在为“复制粘贴喂AI”而烦恼？我的新专栏 《AI原生开发工作流实战》 将带你：

告别低效，重塑开发范式
驾驭AI Agent(Claude Code)，实现工作流自动化
从“AI使用者”进化为规范驱动开发的“工作流指挥家”

扫描下方二维码，开启你的AI原生开发之旅。

原「Gopher部落」已重装升级为「Go & AI 精进营」知识星球，快来加入星球，开启你的技术跃迁之旅吧！

我们致力于打造一个高品质的 Go 语言深度学习 与 AI 应用探索 平台。在这里，你将获得：

体系化 Go 核心进阶内容: 深入「Go原理课」、「Go进阶课」、「Go避坑课」等独家深度专栏，夯实你的 Go 内功。
前沿 Go+AI 实战赋能: 紧跟时代步伐，学习「Go+AI应用实战」、「Agent开发实战课」、「Agentic软件工程课」、「Claude Code开发工作流实战课」、「OpenClaw实战分享」等，掌握 AI 时代新技能。
星主 Tony Bai 亲自答疑: 遇到难题？星主第一时间为你深度解析，扫清学习障碍。
高活跃 Gopher 交流圈: 与众多优秀 Gopher 分享心得、讨论技术，碰撞思想火花。
独家资源与内容首发: 技术文章、课程更新、精选资源，第一时间触达。

衷心希望「Go & AI 精进营」能成为你学习、进步、交流的港湾。让我们在此相聚，享受技术精进的快乐！欢迎你的加入！

img{512x368}

商务合作方式：撰稿、出书、培训、在线课程、合伙创业、咨询、广告合作。如有需求，请扫描下方公众号二维码，与我私信联系。

当 Go 还在追求极简时，C++ 26 却又加了四大“史诗级”新特性

Tony Bai

作者 bigwhite

2026年3月31日 07:26

本文永久链接 – https://tonybai.com/2026/03/31/go-minimalism-vs-cpp26-epic-new-features

大家好，我是Tony Bai。

在这个 Go、Zig 等“小而美”新语言颇受青睐的时代，如果你去技术社区里问一句：“C++ 这门语言怎么样？”

你大概率会得到一堆充满戏谑的回答：“太复杂了，别学”、“从入门到放弃”、“面试造火箭，工作拧螺丝”。

C++，这门诞生于上世纪 80 年代的编程语言，似乎早已被贴上了“老旧、臃肿、极其反人类”的标签。在很多新生代开发者眼里，它就像一头步履蹒跚的史前巨兽，理应被时代所淘汰。

但就在前天（2026年3月29日），这头“史前巨兽”不仅没有倒下，反而亮出了它那足以撕裂天空的獠牙。

C++ 标准委员会主席、C++ 界的“教父级”人物 Herb Sutter 亲自在博客上宣布：C++26 标准的技术工作，已正式完成！

Herb Sutter 还用极其兴奋的口吻将其定义为“自 C++11 以来最具冲击力的一次发布”。而这次更新的核心，是四个被他称为“Fab Four”（神奇四侠）的史诗级新特性。

当我耐着性子看完全部内容后，我脑子里只剩下四个字：叹为观止。

当 Go 语言的开发者还在为“是否要给语言增加一个三元表达式”，或泛型方法而激烈辩论时，C++ 却反其道而行之，给自己又加装了四门“宇宙级”的重型武器。这到底是 C++ 吹响的绝地反击号角，还是压垮骆驼的最后一根稻草？

今天，我们就来硬核扒开 C++26 这四大“金刚”，看看它们到底有多强，以及它们将如何影响将来程序员对编程语言的选择。

第一门重炮：反射（Reflection）——“代码生成代码”的终极魔法

Herb Sutter 将反射放在了四大特性之首，并称之为“自模板（Templates）发明以来 C++ 最重要的升级”。

什么是C++ 的反射？简单来说，就是让代码在编译期拥有了“自我审视”和“自我创造”的能力。

在 C++26 之前，如果你想实现一个通用的 JSON 序列/反序列化库，你必须写大量重复的模板代码，或者用各种丑陋的宏来“欺骗”编译器。

但在 C++26 中，你可以像这样写出充满“神性”的代码（代码示意）：

这段代码，在编译的时候就能根据编译时的输入(test.json)自动分析JSON构造，并生成编译时用于计算的一个新类型。这在 Go 语言里，需要借助 reflect 包在运行时（Runtime）以牺牲性能为代价才能做到。而 C++，直接在静态编译期（Compile-time）零成本搞定了！

Herb Sutter 将其形容为“C++ 的十年火箭引擎”。这意味着，未来 C++ 社区将涌现出无数极其强大、但又极其复杂的元编程（Metaprogramming）库。C++ 的学习曲线，将再次被拉到一个新的高度。

第二道防线：内存安全（Memory Safety）——“只需重编，安全自来”

如果说反射让 C++ 的上限变得更加遥不可及，那么内存安全的提升，则是 C++ 在向 Go 和 Rust 的核心优势区发起的正面冲锋。

C++ 常年被诟病的核心痛点是什么？内存不安全。悬垂指针、未初始化变量读取（导致未定义行为）……这些噩梦困扰了 C++ 程序员几十年。

C++26 给出了一个极其诱人的承诺：你的老代码一行都不用改，只要用 C++26 模式重新编译，就能自动获得大幅度的安全提升！

这主要来源于两个方面的改进：

消灭未初始化变量的 UB：在 C++26 中，读取未初始化局部变量不再是“未定义行为（Undefined Behavior）”。这意味着困扰无数新手的、极其诡异的程序崩溃，将成为历史。
“加固”的标准库：Google 和 Apple 已经将它们内部经过“加固（Hardened）”的标准库实现贡献给了 C++26。这意味着，当你使用 std::vector, std::string 等容器时，大量的边界检查会自动开启。

Herb Sutter 引用了 Google 的内部数据：

“仅在 Google，这项技术就已经修复了超过 1000 个 Bug，预计每年可以预防 1000 到 2000 个新 Bug 的产生，并将整个生产环境的段错误（Segfault）率降低了 30%。”

这简直是在对 Go 说：“你用 GC 换来的那点可怜的安全性，我 C++ 现在也能做到了，而且依然是零成本的！”

第三把利剑：契约（Contracts）——代码里的“法律条文”

如果你写过 Go，你一定对满屏的 if param == nil { return errors.New(…) } 感到厌烦。这种防御性编程，虽然有效，但极其啰嗦。

C++26 正式引入了语言级的契约编程。

你可以像签合同一样，为你的函数制定严格的法律条文：

这些 pre 和 post 是编译器和运行时可以理解并强制执行的“法律”。如果调用者违反了前置条件，程序可以在开发阶段就立刻崩溃并给出明确的报错，而不是等到数据被污染后才在某个奇怪的地方爆炸。

虽然 Go 社区也在讨论类似的泛型断言，但 C++26 已经先行一步，将其做成了语言标准。

第四个引擎：std::execution——C++ 的“亲儿子”协程模型

在 C++20 中，虽然引入了 co_await 协程，但它只是一个语法糖，并没有提供一个统一的调度框架。

C++26 终于补上了这块短板，正式推出了 std::execution，也被称为 Sender/Receiver 模型。

这是一个极其强大、统一的异步模型框架。它让你能以一种声明式的方式，去描述、组合和调度复杂的并发任务流。

下面是一段使用std::execution的代码示例：

// This is an example of a custom algorithm for starting work
// without allocations. This algorithm is also available in
// <exec/start_now.hpp>. (Users that don't write custom sender
// algorithms will not need to use receivers or call connect
// or start.)
template <stdexec::sender_in<stdexec::empty_env> Sender>
struct start_now {
  start_now(Sender sndr)
    : _op(stdexec::connect(std::move(sndr), _sink_rcvr())) {
    stdexec::start(_op);
  }
private:
  // start_now is implemented in terms of this custom receiver,
  // which is used to discard Sender's results.
  struct _sink_rcvr {
    using receiver_concept = stdexec::receiver_t;
    void set_value(auto&&...) noexcept {}
    void set_error(auto&&) noexcept {}
    void set_stopped() noexcept {}
  };
  stdexec::connect_result_t<Sender, _sink_rcvr> _op;
};

int main() {
  // A run loop is a fifo queue of work and a loop to execute the
  // work. It needs to be driven by calling its .run() member fn.
  stdexec::run_loop ctx;
  auto event_loop = ctx.get_scheduler();

  // Create two tasks that cooperatively multitask.
  auto task1 = stdexec::just()
             | stdexec::then([]{ std::puts("hello from task 1! suspending..."); })
             | stdexec::continue_on(event_loop) // suspend
             | exec::repeat_n(5)
             | stdexec::then([]{ std::puts("task 1 is done!"); });

  auto task2 = stdexec::just()
             | stdexec::then([]{ std::puts("hello from task 2! suspending..."); })
             | stdexec::continue_on(event_loop) // suspend
             | exec::repeat_n(8)
             | stdexec::then([]{ std::puts("task 2 is done!"); });

  // Start both tasks. This enqueues them for execution on the run loop.
  auto op1 = start_now(stdexec::start_on(event_loop, std::move(task1)));
  auto op2 = start_now(stdexec::start_on(event_loop, std::move(task2)));

  ctx.finish(); // tell the run loop to stop when the queue is empty
  ctx.run();    // tell the run loop to start executing work in the queue
}

这可以被看作是 C++ 对 Go 的 Goroutine + Channel 模型，以及 Rust 的 async/await + tokio 模型的终极回应。

它让 C++ 开发者第一次拥有了一套语言原生的、能够轻松编写“无数据竞争（Data-race-free by construction）”并发程序的“亲儿子”工具。

小结：一场没有退路的豪赌

反射、安全、契约、并发。C++26 的这四大金刚，每一个都足以在其他语言中引发一场大地震。

我们看到的是一头苏醒的巨兽。它没有选择像 Go 那样“断舍离”，也没有像 Rust 那样“偏执于安全”，而是极其贪婪地选择了：“我全都要！”

它既想要极致的表达能力和零成本抽象（反射、模板），又想要与 Rust 媲美的内存安全（加固标准库），还想要不输 Go 的并发表达力（std::execution）。

C++26 给老兵们提供了前所未有的强大武器，但也把本就陡峭的学习曲线，又向上抬升了一个令人惊叹的高度，宇宙第一复杂的编程语言，实至名归！

当 Go 的开发者还在为“是否要加个三元表达式”而争论不休时，C++ 已经头也不回地奔向了“万神殿”。

或许，编程语言的终局，真的不是“大一统”，而是“两极分化”：一极是像 Go 一样追求极致简单的“工程师语言”；而另一极，则是像 C++ 这样，专为那 1% 的、追求极致性能和控制力的“宗师级”开发者准备的、布满荆棘的封神之路。

C++26，欢迎来到神的世界，也欢迎来到神的炼狱。

参考资料

https://herbsutter.com/2026/03/29/c26-is-done-trip-report-march-2026-iso-c-standards-meeting-london-croydon-uk/
https://herbsutter.com/2025/06/21/trip-report-june-2025-iso-c-standards-meeting-sofia-bulgaria/
https://herbsutter.com/2024/07/02/trip-report-summer-iso-c-standards-meeting-st-louis-mo-usa/
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p2996r13.html
https://www.youtube.com/watch?v=7z9NNrRDHQU
https://www.youtube.com/watch?v=oitYvDe4nps

今日互动探讨：

看完 C++26 的这四大“神仙”特性，你是感到兴奋，还是感到了深深的绝望？你觉得 C++ 的这种“大而全”的演进路线是对的，还是 Go 的“小而美”更代表未来？

欢迎在评论区分享你的看法！

还在为“复制粘贴喂AI”而烦恼？我的新专栏 《AI原生开发工作流实战》 将带你：

告别低效，重塑开发范式
驾驭AI Agent(Claude Code)，实现工作流自动化
从“AI使用者”进化为规范驱动开发的“工作流指挥家”

扫描下方二维码，开启你的AI原生开发之旅。

你的Go技能，是否也卡在了“熟练”到“精通”的瓶颈期？

想写出更地道、更健壮的Go代码，却总在细节上踩坑？
渴望提升软件设计能力，驾驭复杂Go项目却缺乏章法？
想打造生产级的Go服务，却在工程化实践中屡屡受挫？

继《Go语言第一课》后，我的《Go语言进阶课》终于在极客时间与大家见面了！

我的全新极客时间专栏《Tony Bai·Go语言进阶课》就是为这样的你量身打造！30+讲硬核内容，带你夯实语法认知，提升设计思维，锻造工程实践能力，更有实战项目串讲。

目标只有一个：助你完成从“Go熟练工”到“Go专家”的蜕变！现在就加入，让你的Go技能再上一个新台阶！

商务合作方式：撰稿、出书、培训、在线课程、合伙创业、咨询、广告合作。如有需求，请扫描下方公众号二维码，与我私信联系。

Monster Hunter Stories 3: Twisted Reflection Trainer

FLiNG Trainer

作者 FLiNG

2026年3月14日 03:06

16 Options · Game Version: v1.0+ · Last Updated: 2026.03.13 Monster Hunter Stories 3: Twisted Reflection Trainer/Cheat

Options

Num 1 – Infinite Health
Num 2 – Infinite Stamina
Num 3 – Infinite Hearts
Num 4 – Max Kinship
Num 5 – Max Kinship Level
Num 6 – Max Critical Chance
Num 7 – Infinite Exp
Num 8 – Exp Multiplier
Num 9 – Set Game Speed
Num 0 – Super Damage/One Hit Kills

Ctrl+Num 1 – Edit Copper Points
Ctrl+Num 2 – Edit Silver Points
Ctrl+Num 3 – Edit Gold Points
Ctrl+Num 4 – Obtain All Battle Support Items
Ctrl+Num 5 – Obtain All Materials
Ctrl+Num 6 – Obtain All Growth Items

The post Monster Hunter Stories 3: Twisted Reflection Trainer appeared first on FLiNG Trainer.

From Research to Product: Customer Insights on Prompt flow

江边的旱鸭子

作者 John Chou

2024年5月27日 21:55

Time to navigate the frontier

In the dynamic landscape of Large Language Models (LLM), our team is once again at the cutting edge, pioneering a new venture called Prompt flow (PF). My role transcends the rapid and high-quality delivery of products. I need to contemplate the features that can deliver real values to our customers and the user experience that resonates with the essence of those features. This new challenge is substantial for a web front-end engineer and has been a focal point of my professional contemplation.

As 2023 drew to a close, a fortuitous invitation from a university peer led me to explore the synergy between LLMs and conventional ML models. This exploration transformed me into an amateur researcher, granting me the privilege to scrutinize the research process through our customers’ lens, with the aim of pinpointing their pain points to better inform our product design.

Recent academic work I learned from

This paper has been accepted by ACL 2024 Findings few days ago, a great encouragement for us. Please read the full paper if you’re interested in details, which will not be interpreted here.

Harnessing LLMs as post-hoc correctors

A fixed LLM is leveraged to propose corrections to an arbitrary ML model’s predictions without additional training or the need for additional datasets.

Figure 1: Harnessing LLMs as post-hoc correctors. A fixed LLM is leveraged to propose corrections to an arbitrary ML model's predictions without additional training or the need for additional datasets.

A high-level overview of LLMCORR

Harnessing Large Language Models (LLMs) as post-hoc correctors to refine predictions made by an arbitrary Machine Learning (ML) model.

Figure 2: A high-level overview of LLMCORR, harnessing Large Language Models (LLMs) as post-hoc correctors to refine predictions made by an arbitrary Machine Learning (ML) model.

LLMCORR prompt template

Multiple contextual knowledge from training and validation datasets can be included by expanding the template.

Figure 3: LLMCORR prompt template. Multiple contextual knowledge from training and validation datasets can be included by expanding the template.

Reflections on Prompt flow

I have been deeply involved in PF since the inception of this project. Naturally, I endeavored to integrate it into our research, yet reality diverged from my original intentions. Hence, let’s see these few reflections on PF throughout the research journey. Please note that most of our works were done before early February of 2024, so I don’t mean to be a monday morning quarterback for some points.

1. Support and optimization of local inference experience will gain more customer favor

Most researchers and engineers have certain computational resources, and from the perspective of cost control, they are likely to choose open-source large language models (LLMs) for local inference work, which PF does not support. We have similar story that opted for LLMs like Llama at the beginning of the experiment, which meant giving up on PF.

2. Flex mode is crucial for the use of flow

Despite transitioning to OpenAI’s GPT3.5/4 models, our repository was already rich with Python utilities and Jupyter notebooks, complemented by a wealth of projects from previous research endeavors. The core competitive edge of PF aside, the availability of flex mode at that juncture would have allowed for an exploratory integration with our established GNN workflow, potentially igniting a synergistic spark.

3. Is Prompt engineering really that important?

The value of our work lies in placing LLM to an interesting position with right work in the system to maximize its value. The prompts designed and used are relatively common today: simple structure, essential knowledge and few-shots. Therefore prompt engineering have not played the key role in this work.

It is worth mentioning that PF recently launched the Prompty feature, which provides quick access and focuses on the value of tuning prompts. This may be practical in large engineering applications, where the content of a single prompt can range from hundreds to thousands of lines. If the scenario holds, then support complex Jinja Template Designer features and preview the final prompt content will be of great help (just like Overleaf does).

4. What PF did right?

When we began to learn and try to implement RAG App, we naturally looked at some LangChain samples first, then… went from beginner to giving up. My teammate chose the OpenAI Playground to use the GPT4 Assistants feature, meanwhile Azure OpenAI had not supported Assistants yet, so I chose to build a RAG flow following the PF sample. In this scenario, there are a few advantages to note:

Low-code is always easy to build PoC.
Orchestration to do batch run.
Tracing (Not implemented yet at that time, but definitely a keeper feature).

Of course, there are also points worth discussing, such as whether you still need to write some code that will affect the ease of use assessment if it is not clear in the sample that embeddings are generated using the same model set and stored in the vector database; or the data input and output in the batch run scenario, which also involves a lot of manual work.

Another aside, the performance of File retrieval of OpenAI (OAI) Assistants was not satisfactory at that time. I wonder if there has been a significant improvement after it was renamed to File Search now.

5. What should PF focus on if it conducts Experiments?

Firstly, there are some old topics, such as experiment status display and refresh, CRUD operations and viewing logs at each step, which are essential features of various products.

When the amount of experimental data is huge, limitations on metrics like RPM and TPM will start to trouble users. Thus, how to estimate the number of tokens and requests for experiments under these constraints by services like OAI and Azure OpenAI (AOAI) to achieve automated high-concurrency scheduling, and even support multi-endpoints concurrency, will be a great value to customer. In previous experiments, we implemented very basic token calculation and request interval logic, and I believe we are not the only ones with such needs.

Last few words

It’s not commonly encouraged for engineers to delve into academic pursuits, since not everyone possesses the passion, foundation, or even time. However, in the era of Generative AI, immersing oneself in scholarly articles is always a wise move!

Whether in practical application or academic experimentation, only through in-depth engagement can one truly understand and unearth the pain points of users. I believe this embodies the spirit of our current discussion.

What I learned from a part-time research experience?

江边的旱鸭子

作者 John Chou

2024年5月27日 21:55

Time to navigate the frontier

Brief introduction of recent work

This paper has been accepted by ACL 2024 few days ago, a great encouragement for us. Let’s go through the proposed framework quickly by primary figures in the paper. Please read the paper if you have any detailed question.

Figure 1: Harnessing LLMs as post-hoc correctors

A fixed LLM is leveraged to propose corrections to an arbitrary ML model’s predictions without additional training or the need for additional datasets.

Figure 2: A high-level overview of LLMCORR

Harnessing Large Language Models (LLMs) as post-hoc correctors to refine predictions made by an arbitrary Machine Learning (ML) model.

Figure 2: A high-level overview of LLMCORR, harnessing Large Language Models (LLMs) as post-hoc correctors to refine predictions made by an arbitrary Machine Learning (ML) model.

Figure 3: LLMCORR prompt template

Multiple contextual knowledge from training and validation datasets can be included by expanding the template.

Figure 3: LLMCORR prompt template. Multiple contextual knowledge from training and validation datasets can be included by expanding the template.

Reflections on Prompt flow

1. Support and optimization of local inference experience will gain more customer favor

2. Flex mode is crucial for the use of flow

3. Is Prompt engineering really that important?

4. Where does PF excel over LangChain? What’s the value?

Low-code is always easy to build PoC.
Orchestration to do batch run.
Tracing (Not implemented yet at that time, but definitely a keeper feature).

Another aside, the performance of File retrieval of OAI Assistants was not satisfactory at that time. I wonder if there has been a significant improvement after it was renamed to File Search now.

5. What should PF focus on if it conducts Experiments?

Firstly, there are some old topics, such as experiment status display and refresh, CRUD operations and viewing logs at each step, which are essential features of various products.

When the amount of experimental data is huge, limitations on metrics like RPM and TPM will start to trouble users. Thus, how to estimate the number of tokens and requests for experiments under these constraints by services like OAI and AOAI to achieve automated high-concurrency scheduling, and even support multi-endpoints scheduling, will be a great value to customer. In previous experiments, we implemented very basic token calculation and request interval logic, and I believe we are not the only ones with such needs.

Last few words

What I’ve learned from a part-time research experience?

江边的旱鸭子

作者 John Chou

2024年5月27日 21:55

Time to navigate the frontier

In the dynamic landscape of Large Language Models (LLM), our team is once again at the cutting edge, pioneering a new venture called Prompt flow (PF). My role transcends the rapid and high-quality delivery of products. I need to contemplate the features that can deliver real values to our customers and the user experience that resonates with the essence of those features. This challenge is substantial for a web front-end engineer and has been a focal point of my professional contemplation.

Introduction of recent work

This paper has been accepted by ACL 2024 few days ago, a great encouragement for us.

Reflections on Prompt flow

1. Support and optimization of local inference experience will gain more customer favor

2. Flex mode is crucial for the use of flow

3. Is Prompt engineering really that important?

4. Where does PF excel over LangChain? What’s the value?

Low-code is always easy to build PoC.
Orchestration to do batch run.
Tracing (Not implemented yet at that time, but definitely a keeper feature).

Another aside, the performance of File retrieval of OAI Assistants was not satisfactory at that time. I wonder if there has been a significant improvement after it was renamed to File Search now.

5. What should PF focus on if it conducts Experiments?

Firstly, there are some old topics, such as experiment status display and refresh, CRUD operations and viewing logs at each step, which are essential features of various products.

When the amount of experimental data is huge, limitations on metrics like RPM and TPM will start to trouble users. Thus, how to estimate the number of tokens and requests for experiments under these constraints by services like OAI and AOAI to achieve automated high-concurrency scheduling, and even support multi-endpoints scheduling, will be a great value to customer. In previous experiments, we implemented very basic token calculation and request interval logic, and I believe we are not the only ones with such needs.