普通视图

发现新文章,点击刷新页面。
昨天以前云原生

Kubernetes's Anxiety and Rebirth in the AI Wave

作者 Jimmy Song
2026年4月3日 13:20

Kubernetes hasn’t been replaced by AI, but it’s being redefined by it. Anxiety is the prelude to rebirth.

After attending KubeCon EU 2026 in Amsterdam, I’ve been pondering a key question: Kubernetes isn’t obsolete, but it’s no longer “enough”; it hasn’t been replaced by AI, but it’s being redefined by AI.

Figure 1: KubeCon EU 2026 slogan: Keep Cloud Native Moving. This event had over 13,000 registrations, making it the largest KubeCon to date.
Figure 1: KubeCon EU 2026 slogan: Keep Cloud Native Moving. This event had over 13,000 registrations, making it the largest KubeCon to date.

This was my third time attending KubeCon in Europe. Over the past few years, you can actually see the community’s mindset shift through the event slogans:

  • 2024 Paris: La vie en Cloud Native

    → Cloud Native has become a “way of life,” the default state

  • 2025 London: No slogan, just the 10th anniversary

    → Kubernetes reached a milestone, focusing on retrospection rather than moving forward

  • 2026 Amsterdam: Keep Cloud Native Moving

    → But the question is: where is it moving?

The absence of a slogan in 2025 was a signal in itself:

When an ecosystem starts commemorating the past instead of defining the future, it’s already at an inflection point.

This article doesn’t recap the talks, but instead distills my observations at KubeCon into insights about Kubernetes’ anxiety and rebirth in the AI wave.

The Root of Anxiety: Is Kubernetes Facing a “Crisis”?

The biggest change at KubeCon was that AI has completely replaced traditional cloud native topics. The focus shifted from service optimization and microservices management to how to deploy and manage AI workloads on Kubernetes, especially inference tasks and GPU scheduling.

Figure 2: Before KubeCon officially started, the Maintainer Summit was all about AI.
Figure 2: Before KubeCon officially started, the Maintainer Summit was all about AI.

Kubernetes, as the foundational infrastructure, was once the core of the cloud native world. With the explosive growth of AI models, the question now is whether Kubernetes can still serve as a “universal” platform for everything, which has become a new source of anxiety.

The AI boom brings real challenges: Can Kubernetes’ “universality” adapt to the complexity of AI workloads?

The Focus Brought by the AI Boom

AI’s popularity has shifted the cloud native spotlight entirely to artificial intelligence. AI coding, OpenClaw, large language models, and generative models have all drawn widespread attention. AI has become the core computing demand in the real world.

This surge in demand raises the question: Can Kubernetes continue to serve as the infrastructure platform for complex tasks? Especially with issues like GPU sharing, inference model scheduling, VRAM allocation, and device attribute selection, is the traditional Kubernetes resource model sufficient?

In the past, Kubernetes handled compute, storage, and networking as foundational infrastructure. But with the rapid development of AI, its “universality” is being challenged. Particularly for inference tasks, Kubernetes’ model appears thin.

Comparing with OpenStack: Will Kubernetes Repeat History?

OpenStack once aimed to be a complete open-source cloud platform, but ultimately failed to sustain growth due to complexity and a lack of flexibility in adapting to new technologies.

Will Kubernetes follow the same path? I believe Kubernetes has different strengths: as a container and microservices orchestration platform, it’s widely adopted and has strong community and vendor support. It doesn’t try to replace all cloud provider capabilities but serves as an infrastructure control plane to help users manage resources.

Figure 3: Cloud native contributors remain active. The crowd at the KubeCon EU 2026 Maintainer Summit shows the community’s vitality.
Figure 3: Cloud native contributors remain active. The crowd at the KubeCon EU 2026 Maintainer Summit shows the community’s vitality.

However, as AI workloads become mainstream, Kubernetes must find a new position to avoid being replaced by “AI-optimized platforms.”

Kubernetes’ Challenge: The GPU Resource Management Gap

At KubeCon, NVIDIA announced the donation of the GPU DRA (Dynamic Resource Allocation) driver to the CNCF, marking the upstreaming of GPU resource management. GPU sharing and scheduling have become urgent issues for Kubernetes.

Traditionally, Kubernetes relied on the Device Plugin model to schedule GPUs, only supporting allocation by device count (e.g., nvidia.com/gpu: 1). But for AI inference tasks, more information is needed for resource scheduling, such as VRAM size, GPU topology, and sharing strategies. NVIDIA DRA makes GPU resource management more flexible and intelligent, gradually easing the “GPU resource crunch” in AI workloads.

This shift means Kubernetes is no longer just a “container orchestration platform,” but is becoming the infrastructure layer for AI-specific resource scheduling.

Against this backdrop, both the community and industry are exploring finer-grained GPU resource abstraction and scheduling mechanisms. For example, the open-source project HAMi is building a GPU resource management layer for AI workloads on top of Kubernetes, supporting GPU sharing, VRAM-level allocation, and heterogeneous device scheduling.

Figure 4: HAMi demo at KubeCon EU 2026 Keynote
Figure 4: HAMi demo at KubeCon EU 2026 Keynote

These efforts are not about replacing Kubernetes, but about filling the resource model gaps for the AI era. In the long run, this layer may evolve into a “GPU Abstraction Layer” similar to CNI/CSI, becoming a key part of AI-native infrastructure.

The Production “Gap”: Many AI PoCs, Few in Production

A common post-event summary was: Many PoCs, but “everyday production deployments” are still rare. Pulumi summarized it as:

lots of working demos, very few production setups people trust

This shows that while many AI workload solutions succeed in technical demos, the transition from experimentation to production remains difficult. Whether it’s GPU resource sharing or inference request scheduling, whether Kubernetes as the foundation can support this transformation is still an open question.

The Rise of Inference Systems: Kubernetes’ Scheduling Boundaries Are Challenged

Another major event at this KubeCon was llm-d being contributed to the CNCF as a Sandbox project.

If GPU DRA represents the upstreaming of device resource models, then llm-d represents another critical evolution: Distributed LLM inference capabilities are moving from proprietary engineering implementations to standardized, community-driven collaboration in cloud native.

This is significant not just because it’s another open-source project, but because it shows that Kubernetes’ challenges in the AI era are no longer just about “how to schedule GPUs,” but also “how to host inference systems themselves.” As prefill/decode separation, request routing, KV cache management, and throughput optimization move into the infrastructure layer, Kubernetes’ boundaries are being redefined.

Traditionally, the Kubernetes scheduler focused on Pod scheduling. But in AI inference scenarios, scheduling is not just about picking a node—it’s about selecting the most suitable inference instance based on request characteristics. Factors like model state, request queue depth, and cache hit rate all need to be considered. This process is increasingly managed by inference runtimes, forming new “request-level scheduling” systems.

This leads to an overlap between the Kubernetes scheduler and inference systems, forcing Kubernetes to rethink its role: should it keep expanding, or collaborate with inference systems?

AI-Native Infrastructure: The Key Challenge for Production

At the AI Native Summit, the real needs for AI-native infrastructure were especially clear. The focus was no longer “can it run on Kubernetes,” but how to make AI workloads routine, stable, and production-ready on Kubernetes.

Figure 5: At the AI Native Summit after KubeCon, Linux Foundation Chairman Jonathan said cloud native is entering the AI-native era.
Figure 5: At the AI Native Summit after KubeCon, Linux Foundation Chairman Jonathan said cloud native is entering the AI-native era.

The core challenge is delivery. Unlike traditional apps, AI model weights are often huge—tens of GB or even TB—making model delivery and data management extremely complex. Traditional container delivery systems (like image layers) struggle with such massive data and complex versioning.

A key direction for Kubernetes is to standardize model weight and data delivery, using ImageVolume and OCI artifacts to solve AI model delivery and version management on Kubernetes. This not only reduces “cold start” times but also provides infrastructure support for multi-tenancy and compliance.

Summary

Kubernetes won’t be replaced by AI, but it’s being reshaped as the core of infrastructure. This anxiety is the force driving its evolution—it’s moving from a “general-purpose infrastructure platform” to an “AI-powered multifunctional base”. Some even call it the AI operating system.

In the future, Kubernetes’ core competitiveness will no longer be just container management, but how effectively it can schedule and manage AI workloads, and how it can make AI a routine part of operations. This was my biggest takeaway from the AI Native Summit and KubeCon, and it’s what I look forward to in the Kubernetes ecosystem over the next few years.

References

Day One in Amsterdam: Kubernetes Is Rethinking AI

作者 Jimmy Song
2026年3月23日 04:41

Today marks my first day at KubeCon Europe 2026. The most striking feeling is: the world is vast, but this community is truly small.

Figure 11: Jimmy on the first day of KubeCon EU 2026
Figure 11: Jimmy on the first day of KubeCon EU 2026

One strong impression stands out:

The world is big, but this circle is really small.

Old Friends, New Cycle

At the Maintainer Summit, I met many familiar faces—

Colleagues from Ant Group, friends from Tetrate, and some people I’ve known for nearly a decade. Together, we’ve journeyed from the early days of Kubernetes, Service Mesh, and cloud native infrastructure to today.

In a sense, this generation has fully experienced:

  • The rise of Kubernetes
  • The standardization of Cloud Native
  • The microservices and service mesh boom
  • And now, the era of AI Infrastructure

This isn’t about “new people entering the field,” but rather—

The same group stepping into a new technology cycle.

What Is the Maintainer Summit Discussing?

If you ask:

What is the Kubernetes community most concerned about right now?

Today’s answer is very clear:

👉 How to run AI workloads better on Kubernetes

Figure 12: The Maintainer Summit’s main topic is AI Infra
Figure 12: The Maintainer Summit’s main topic is AI Infra

Many topics at the Maintainer Summit revolved around:

  • Scheduling models for LLM / AI workloads
  • GPU / accelerator resource management
  • Integrating inference systems with Kubernetes
  • Redefining the roles of data plane vs. control plane
  • How observability tools like OTel monitor AI workloads

In other words:

Kubernetes hasn’t been replaced by AI; it’s actively “absorbing” AI.

Key Signal: GPUs Are Becoming the “Infrastructure Layer”

Today, I had an in-depth discussion with CNCF TOC, Red Hat, and the vLLM community.

The core question was:

How should GPUs be “platformized”?

Some consensus is already clear:

  • GPUs are no longer just devices
  • They are now a schedulable, partitionable, and shareable resource layer
Figure 13: TOC meeting discussing GPU resource management and LLM Serving integration
Figure 13: TOC meeting discussing GPU resource management and LLM Serving integration

At the Maintainer Summit in Amsterdam, we had deep discussions with CNCF TOC, Red Hat, and the vLLM community about GPU resource management and LLM Serving integration in Kubernetes scenarios, and explored potential collaboration between vLLM and HAMi.

Behind this is a major paradigm shift:

Past Now
GPU = Node resource GPU = Infrastructure layer
Exclusive use Multi-tenant sharing
Static binding Dynamic scheduling
Managed within frameworks Unified management at the platform layer

This is exactly what we’ve been working on in HAMi.

HAMi: From “Project” to “Reference Pattern”

Another interesting change today:

HAMi is no longer just a “community project”—it’s becoming:

A reference implementation (reference pattern) for AI Infra

Figure 14: Li Mengxuan, CTO of Dynamia, sharing HAMi’s design and practice at KubeCon EU 2026 Maintainer Summit
Figure 14: Li Mengxuan, CTO of Dynamia, sharing HAMi’s design and practice at KubeCon EU 2026 Maintainer Summit

This is reflected in several ways:

  • Invited to present at the Maintainer Summit
  • Participating in CNCF TOC discussions
  • Involved in incubating review demos
  • Exploring joint content with the vLLM community (even discussing a joint blog 👀)

Especially in conversations with Red Hat and vLLM, a clear trend emerged:

GPU resource management and LLM serving are becoming coupled

That is:

  • Upper layer: vLLM / inference frameworks
  • Lower layer: GPU scheduling / sharing

A new “interface layer” is gradually forming.

This is a direction worth betting on.

Figure 15: At the TAG Workshop, HAMi was discussed as an Incubating demo
Figure 15: At the TAG Workshop, HAMi was discussed as an Incubating demo

A Caution: The AI Infra Startup Boom Hasn’t Really Begun

At the same time, I have a somewhat “counterintuitive” observation:

We haven’t yet seen a large wave of AI Infra (K8s-focused) startups.

Most companies I saw today:

  • Many are pivoting from CI/CD, Service Mesh, or Gateway
  • Many are traditional cloud vendors extending into AI
  • Many are working on models, agents, or even lower-level tech

But those truly focused on:

“Making AI workloads run better on Kubernetes”

There are actually not many startups at this layer.

This could mean two things:

1) This Layer Isn’t Fully Formed Yet

Currently, most activity is at:

  • The model layer (LLM / foundation models)
  • The application layer (Agent / Copilot)

But not at:

  • The scheduling layer
  • The resource layer
  • The runtime layer

2) Or, the Barrier to Entry Is Very High

Because at its core, this is:

The intersection of Cloud Native × GPU × AI workload

It’s not just “wrapping AI,” but a fundamental re-architecture at the infrastructure level.

My Take

If we break down the AI technology stack:

Agent / Application
LLM Serving (vLLM, etc.)
AI Runtime / Scheduling
GPU Resource Layer
Hardware

Most innovation today is concentrated in:

  • The top two layers (Agent / LLM)

But the real long-term moat lies in:

  • The middle two layers (Runtime + Resource Layer)

And Kubernetes is very likely to remain:

The default platform for this middle layer

Summary

Today’s takeaway:

Kubernetes is not obsolete; it’s being redefined.

And our generation is shifting from:

“Cloud Native Builders”

to:

“AI Infrastructure Builders”

More to come tomorrow.

HAMi Website Refactor: Why HAMi Docs and Website Underwent a Complete Redesign

作者 Jimmy Song
2026年3月17日 08:55

This redesign is more than a style update—it’s a step toward clearer technical communication and better user experience. Try the new HAMi website at https://project-hami.io and submit issues here.

Over the past two months, I conducted a thorough refactor of the documentation website (see GitHub). Externally, it looks like a “visual redesign”, but from the perspective of community maintainers and content builders, it’s a comprehensive upgrade of information architecture, content system, and frontend experience.

This article aims to systematically explain three things: why we did this refactor, what exactly changed, and what these changes mean for the HAMi community.

Why Refactor the Website and Documentation

HAMi is a CNCF-hosted open source project initiated and contributed by Dynamia, with growing influence in GPU virtualization, heterogeneous compute scheduling, and AI infrastructure. The community content is expanding, and user types are becoming more diverse: from first-time visitors to engineers and enterprise users seeking deployment docs, architecture diagrams, case studies, and ecosystem information.

The original site was functional, but as content grew, several issues became apparent:

  • The homepage lacked information density, making it hard to quickly grasp the project’s overall value.
  • Connections between docs, blogs, and community info were not smooth; content entry points were scattered.
  • Search experience was unstable; external solutions were not ideal in practice.
  • Mobile experience had many details needing improvement, especially navigation, card layouts, and footer areas.
  • Visual style was inconsistent, making it hard to convey community influence and engineering maturity.

For a fast-evolving open source community, the website is not just a “place for docs”, but the public interface of the community. It needs to serve as project introduction, knowledge gateway, adoption proof, community connector, and brand expression.

So the goal of this refactor was clear: not just superficial beautification, but to truly upgrade the website into HAMi’s systematic community entry point.

What Was Done in This Refactor

This update was not a single-point change, but a series of systematic improvements.

Homepage Redesign and Complete Information Architecture Overhaul

The most obvious change is the homepage.

We redesigned the homepage structure, moving away from simply stacking content blocks, and instead organizing the page around the main narrative: “Project Positioning → Core Capabilities → Ecosystem Entry → Content Accumulation → Community Trust”.

Specifically, the homepage received several key upgrades:

  • Rebuilt the Hero section to strengthen first-screen information delivery and action entry.
  • Optimized CTA design so users can quickly access docs, blogs, and resources.
  • Added and enhanced multiple homepage sections to showcase project value and community reach in a more structured way.
  • Adjusted visual hierarchy, background atmosphere, and scroll rhythm, transforming the homepage from a “content list” into a “narrative page”.

These changes include Hero animations and atmosphere layers, research/story sections, new resource entry sections, refreshed CTAs, unified background design, and ongoing reduction of visual noise. Together, they solve a core problem: enabling visitors to understand what HAMi is and why it’s worth exploring further within seconds.

Architecture Diagrams

Key diagrams were redrawn for clearer technical communication. This helps users grasp HAMi’s role in AI infrastructure.

Figure 1: HAMi website homepage architecture diagram
Figure 1: HAMi website homepage architecture diagram

For HAMi, this change is critical. The community faces not just a single feature, but a set of system-level challenges involving Kubernetes, schedulers, GPU Operators, heterogeneous devices, and enterprise platforms. Improved diagrams make the website a better technical entry point.

Added Case Studies, Community, and Ecosystem Sections to Make Impact Visible

Another important direction was strengthening the “community proof” layer.

Many open source project sites fall into the trap of having complete docs, but users can’t tell if the project is truly adopted, if the community is active, or if the ecosystem is expanding. The HAMi website redesign consciously addresses this.

Figure 2: HAMi ecosystem and device support
Figure 2: HAMi ecosystem and device support
Figure 3: HAMi adopters
Figure 3: HAMi adopters
Figure 4: HAMi contributor organizations
Figure 4: HAMi contributor organizations

Blog & Reading Experience

Blog cards, lists, and metadata were unified for easier reading and sharing. Blogs are now a core communication layer.

Figure 5: HAMi website blog list page
Figure 5: HAMi website blog list page

Mobile Optimization

Navigation, card layouts, footer, and search were improved for smoother mobile browsing.

Figure 6: HAMi website mobile view
Figure 6: HAMi website mobile view

Footer & Search

Footer layout was enhanced for better navigation and credibility. Built-in search replaced unreliable external solutions, improving content accessibility.

Figure 7: HAMi website footer
Figure 7: HAMi website footer
Figure 8: HAMi website built-in search
Figure 8: HAMi website built-in search

What This Redesign Means for the HAMi Community

From screenshots, it looks like “the website looks better”. But from a community-building perspective, its significance is deeper.

First, HAMi’s external expression is more systematic.

The website is no longer just a collection of scattered pages, but is forming a complete narrative chain: users can understand project value from the homepage, capability details from docs, practical paths from blogs, and community impact from ecosystem modules.

Second, community content assets are reorganized.

Previously, valuable articles, diagrams, and explanations existed but were hard to find. Now, through homepage sections, navigation, and search refactor, these contents are more effectively connected.

Third, HAMi’s community image is more mature.

A mature open source project needs not just an active code repository, but clear, stable, and sustainable website expression. Structure, style, and usability are part of the community’s engineering capability.

Fourth, this lays the foundation for expanding case studies, adopters, contributors, and ecosystem content.

With the framework sorted, adding more case studies, collaboration entry points, or showcasing more adopters and partners will be more natural and easier for users to understand.

As a Community Contributor, My Top Three Takeaways from This Redesign

In summary, I believe this refactor got three things right:

  • Upgraded the website from a “content dump” to a “community gateway”.
  • Combined visual optimization with information architecture adjustment, not just a skin change.
  • Improved basic experiences like search, mobile, navigation, and footer.

These may not be as flashy as launching a new feature, but they directly impact content dissemination, user comprehension, and the project’s long-term image.

For infrastructure projects like HAMi, technical capability is fundamental, but clearly communicating, organizing, and continuously presenting that capability is also a form of infrastructure.

Summary

This HAMi documentation and website refactor is essentially an upgrade to the community’s “expression layer” infrastructure.

It improves visual and reading experience, reorganizes content, homepage narrative, search paths, mobile access, and community signal display. Homepage redesign, architecture diagram redraw, unified blog style, mobile optimization, enhanced footer, and switching from external to built-in search together constitute a true “refactor”.

Externally, it helps more people quickly understand HAMi; internally, it provides a stable platform for the community to accumulate case studies, expand the ecosystem, and serve adopters and contributors.

The website is not an accessory to the open source community, but part of its long-term influence. HAMi’s redesign is about taking this seriously.

If you’re interested in Kubernetes GPU virtualization, add me on WeChat jimmysong or scan the QR code below.

GTC 2026 Eve: AI is Becoming the New Infrastructure

作者 Jimmy Song
2026年3月15日 11:34

AI is quietly reshaping the infrastructure landscape, and GTC 2026 may become a key node in this transformation.

Next week, one of the most important technology conferences in the AI industry, NVIDIA GTC 2026, will be held in San Jose, USA.

For many people, GTC is just a GPU technology conference. But if you follow the development of the AI industry over the past few years, you’ll find an interesting phenomenon:

Many important narratives about AI infrastructure are gradually taking shape at GTC.

From CUDA, DGX, to AI Factory, and most recently Jensen Huang’s proposed AI Five-Layer Cake, NVIDIA is constantly attempting to redefine the computing infrastructure of the AI era.

This is why many people call GTC:

AI’s “Woodstock.”

Figure 1: NVIDIA GTC Conference
Figure 1: NVIDIA GTC Conference

This year’s GTC (March 16-19) is expected to cover various levels of the AI stack, including:

  • AI Chips
  • AI Data Centers
  • AI Agents
  • Robotics
  • Inference Computing

According to NVIDIA’s official blog, this year’s keynote will focus on the complete AI stack from chips to applications.

If we put these signals together, we can actually see a larger trend:

AI is transforming from an “applied technology” into “infrastructure.”

The Perspective of Industrial Revolutions

From a longer time scale, the technological revolutions in human history are essentially infrastructure revolutions.

We usually divide industrial revolutions into four times.

In the table below, you can see the infrastructure corresponding to each industrial revolution:

Industrial Revolution Infrastructure
Steam Revolution Steam Engine
Electrical Revolution Power Grid
Digital Revolution Computer
Internet Era Network
Table 1: Industrial Revolutions and Corresponding Infrastructure

First Industrial Revolution: Steam

The steam engine allowed humans to utilize mechanical power on a large scale for the first time. Production no longer relied on human or animal power, but on machines.

Second Industrial Revolution: Electricity

Electricity changed not only the source of power, but also the organization of production. Assembly lines, large-scale manufacturing, and modern industrial systems are all built on the foundation of the power grid.

Third Industrial Revolution: Computers

Computers allowed information to be processed digitally. Software became a production tool.

Fourth Industrial Revolution: Internet and Intelligence

The internet connects all computers together. Cloud computing transforms computing resources into infrastructure. And AI gives machines a certain degree of “cognitive ability.”

The True Significance of AI

If we observe these industrial revolutions, we discover a pattern:

Each industrial revolution produces a new General Purpose Infrastructure.

And AI is likely to become the next-generation infrastructure.

NVIDIA even directly stated in a recent article:

AI is essential infrastructure, like electricity and the internet.

In other words:

AI is no longer just an applied technology, but a new factor of production.

NVIDIA’s Five-Layer Cake

Recently, Jensen Huang proposed a very interesting concept: AI Five-Layer Cake.

Figure 2: AI Five Layer Cake (Image source: <a href="https://blogs.nvidia.com/blog/ai-5-layer-cake/" target="_blank" rel="noopener">NVIDIA</a>)
Figure 2: AI Five Layer Cake (Image source: NVIDIA)

AI is broken down into five layers:

  1. Energy
  2. Chips
  3. AI Infrastructure
  4. Models
  5. Applications

This model actually illustrates one thing:

AI is a complete industrial system.

Jensen Huang even described AI at Davos as:

“One of the largest-scale infrastructure constructions in human history.”

Signals GTC 2026 May Release

This year’s GTC is expected to release several important directions.

Inference Computing

The focus of AI in the past was training. But the main load of AI in the future is likely to be Inference.

Analysts expect that by 2030, 75% of computing demand in the AI data center market will come from inference.

Agentic AI

The past AI model was:

User → Model → Answer

The Agent model is more complex:

User → Agent → Tools → Model → Action

The flowchart below shows the main interaction paths in the Agent model:

Figure 3: Agentic AI Interaction Flow
Figure 3: Agentic AI Interaction Flow

AI is no longer just answering questions, but executing tasks.

Agent Platform

Recent media reports suggest that NVIDIA may launch a new Agent platform: NemoClaw, aimed at helping enterprises deploy AI Agents.

If this project is truly released, it means NVIDIA’s stack will become the following structure:

Figure 4: NVIDIA Agent Platform Architecture
Figure 4: NVIDIA Agent Platform Architecture

This is actually a complete AI stack.

Agents Change Computing Workloads

The emergence of Agents brings new computing workload issues.

Past AI workloads were mainly:

  • Training
  • Inference

But Agents bring a third type of workload:

Agent Workloads

The figure below shows the diverse workload types related to Agents:

Figure 5: Agent Workloads Structure
Figure 5: Agent Workloads Structure

The characteristic of this workload is highly fragmented. GPUs are no longer occupied for long periods, but rather face many small requests. This poses new challenges for infrastructure.

AI-Native Infrastructure

For the past few years, I’ve been thinking about a question:

What is AI-native infrastructure?

It is clearly not just “Kubernetes with GPUs.” I’m more inclined to believe it needs to possess several characteristics.

GPU as a First-Class Resource

In the cloud computing era, CPU is the core resource. In the AI era, GPU is the core resource.

Heterogeneous Computing

Real-world AI chips are not limited to NVIDIA:

  • NVIDIA
  • Ascend
  • Cambricon
  • Metax
  • Moore Threads

Future AI infrastructure must be able to manage heterogeneous computing.

GPU Sharing

GPU is a very expensive resource. If it cannot be shared, utilization will be very low. This is why GPU virtualization and slicing are becoming increasingly important.

AI Scheduling

AI scheduling includes not only traditional CPU and Memory, but also:

GPU
VRAM
Topology
Bandwidth

A Possible AI Tech Stack

Combining the above trends, the future AI stack may present the following structure:

Figure 6: AI Tech Stack Evolution
Figure 6: AI Tech Stack Evolution

This structure is very close to NVIDIA’s Five-Layer Cake.

My Judgment

Combining signals from GTC, AI Factory, Agents, and AI Five-Layer Cake, we can see a very obvious trend:

AI is rewriting computing infrastructure.

Future competition may not just be “who has the best model,” but:

Who has the best AI Infrastructure.

Just like the past few decades:

  • Electricity determines industrial capability
  • Internet determines information capability
  • Cloud computing determines software capability

The future may be:

AI Infrastructure determines intelligence capability.

Summary

If we stretch the time scale a bit longer, we may be in a new historical stage.

AI is no longer just a technological tool. It is becoming new infrastructure.

Just like:

  • Electricity
  • Internet
  • Cloud computing

And AI-native infrastructure is likely to become one of the most important technology directions for the next decade.

When GPUs Move Toward Open Scheduling: Structural Shifts in AI Native Infrastructure

作者 Jimmy Song
2026年2月13日 22:32

The future of GPU scheduling isn’t about whose implementation is more “black-box”—it’s about who can standardize device resource contracts into something governable.

Figure 1: GPU Open Scheduling
Figure 1: GPU Open Scheduling

Introduction

Have you ever wondered: why are GPUs so expensive, yet overall utilization often hovers around 10–20%?

Figure 2: GPU Utilization Problem: Expensive GPUs with only 10-20% utilization
Figure 2: GPU Utilization Problem: Expensive GPUs with only 10-20% utilization

This isn’t a problem you solve with “better scheduling algorithms.” It’s a structural problem - GPU scheduling is undergoing a shift from “proprietary implementation” to “open scheduling,” similar to how networking converged on CNI and storage converged on CSI.

In the HAMi 2025 Annual Review, we noted: “HAMi 2025 is no longer just about GPU sharing tools—it’s a more structural signal: GPUs are moving toward open scheduling.”

By 2025, the signals of this shift became visible: Kubernetes Dynamic Resource Allocation (DRA) graduated to GA and became enabled by default, NVIDIA GPU Operator started defaulting to CDI (Container Device Interface), and HAMi’s production-grade case studies under CNCF are moving “GPU sharing” from experimental capability to operational excellence.

This post analyzes this structural shift from an AI Native Infrastructure perspective, and what it means for Dynamia and the industry.

Why “Open Scheduling” Matters

In multi-cloud and hybrid cloud environments, GPU model diversity significantly amplifies operational costs. One large internet company’s platform spans H200/H100/A100/V100/4090 GPUs across five clusters. If you can only allocate “whole GPUs,” resource misalignment becomes inevitable.

“Open scheduling” isn’t a slogan—it’s a set of engineering contracts being solidified into the mainstream stack.

Standardized Resource Expression

Before: GPUs were extended resources. The scheduler didn’t understand if they represented memory, compute, or device types.

Figure 3: Open Scheduling Standardization Evolution
Figure 3: Open Scheduling Standardization Evolution

Now: Kubernetes DRA provides objects like DeviceClass, ResourceClaim, and ResourceSlice. This lets drivers and cluster administrators define device categories and selection logic (including CEL-based selectors), while Kubernetes handles the full loop: match devices → bind claims → place Pods onto nodes with access to allocated devices.

Even more importantly, Kubernetes 1.34 stated that core APIs in the resource.k8s.io group graduated to GA, DRA became stable and enabled by default, and the community committed to avoiding breaking changes going forward. This means the ecosystem can invest with confidence in a stable, standard API.

Standardized Device Injection

Before: Device injection relied on vendor-specific hooks and runtime class patterns.

Now: The Container Device Interface (CDI) abstracts device injection into an open specification. NVIDIA’s Container Toolkit explicitly describes CDI as an open specification for container runtimes, and NVIDIA GPU Operator 25.10.0 defaults to enabling CDI on install/upgrade—directly leveraging runtime-native CDI support (containerd, CRI-O, etc.) for GPU injection.

This means “devices into containers” is also moving toward replaceable, standardized interfaces.

HAMi: From “Sharing Tool” to “Governable Data Plane”

On this standardization path, HAMi’s role needs redefinition: it’s not about replacing Kubernetes—it’s about turning GPU virtualization and slicing into a declarative, schedulable, governable data plane.

Data Plane Perspective

HAMi’s core contribution expands the allocatable unit from “whole GPU integers” to finer-grained shares (memory and compute), forming a complete allocation chain:

  1. Device discovery: Identify available GPU devices and models
  2. Scheduling placement: Use Scheduler Extender to make native schedulers “understand” vGPU resource models (Filter/Score/Bind phases)
  3. In-container enforcement: Inject share constraints into container runtime
  4. Metric export: Provide observable metrics for utilization, isolation, and more

This transforms “sharing” from ad-hoc “it runs” experimentation into engineering capability that can be declared in YAML, scheduled by policy, and validated by metrics.

Scheduling Mechanism: Enhancement, Not Replacement

HAMi’s scheduling doesn’t replace Kubernetes—it uses a Scheduler Extender pattern to let the native scheduler understand vGPU resource models:

  • Filter: Filter nodes based on memory, compute, device type, topology, and other constraints
  • Score: Apply configurable policies like binpack, spread, topology-aware scoring
  • Bind: Complete final device-to-Pod binding

This architecture positions HAMi naturally as an execution layer under higher-level “AI control planes” (queuing, quotas, priorities)—working alongside Volcano, Kueue, Koordinator, and others.

Figure 4: HAMi Scheduling Architecture (Filter → Score → Bind)
Figure 4: HAMi Scheduling Architecture (Filter → Score → Bind)

Production Evidence: From “Can We Share?” to “Can We Operate?”

CNCF public case studies provide concrete answers: in a hybrid, multi-cloud platform built on Kubernetes and HAMi, 10,000+ Pods run concurrently, and GPU utilization improves from 13% to 37% (nearly 3×).

Figure 5: CNCF Production Case Studies: Ke Holdings 13%→37%, DaoCloud 80%+ utilization, SF Technology 57% savings
Figure 5: CNCF Production Case Studies: Ke Holdings 13%→37%, DaoCloud 80%+ utilization, SF Technology 57% savings

Here are highlights from several cases:

Case Study 1: Ke Holdings (February 5, 2026)

  • Environment: 5 clusters spanning public and private clouds
  • GPU models: H200/H100/A100/V100/4090 and more
  • Architecture: Separate “GPU clusters” for large training tasks (dedicated allocation) vs “vGPU clusters” with HAMi fine-grained memory slicing for high-density inference
  • Concurrent scale: 10,000+ Pods
  • Outcome: Overall GPU utilization improved from 13% to 37% (nearly 3×)

Case Study 2: DaoCloud (December 2, 2025)

  • Hard constraints: Must remain cloud-native, vendor-agnostic, and compatible with CNCF toolchain
  • Adoption outcomes:
    • Average GPU utilization: 80%+
    • GPU-related operating cost reduction: 20–30%
    • Coverage: 10+ data centers, 10,000+ GPUs
  • Explicit benefit: Unified abstraction layer across NVIDIA and domestic GPUs, reducing vendor dependency

Case Study 3: Prep EDU (August 20, 2025)

  • Negative experience: Isolation failures in other GPU-sharing approaches caused memory conflicts and instability
  • Positive outcome: HAMi’s vGPU scheduling, GPU type/UUID targeting, and compatibility with NVIDIA GPU Operator and RKE2 became decisive factors for production adoption
  • Environment: Heterogeneous RTX 4070/4090 cluster

Case Study 4: SF Technology (September 18, 2025)

  • Project: EffectiveGPU (built on HAMi)
  • Use cases: Large model inference, test services, speech recognition, domestic AI hardware (Huawei Ascend, Baidu Kunlun, etc.)
  • Outcomes:
    • GPU savings: Large model inference runs 65 services on 28 GPUs (37 saved); test cluster runs 19 services on 6 GPUs (13 saved)
    • Overall savings: Up to 57% GPU savings for production and test clusters
    • Utilization improvement: Up to 100% GPU utilization improvement with GPU virtualization
  • Highlights: Cross-node collaborative scheduling, priority-based preemption, memory over-subscription

These cases demonstrate a consistent pattern: GPU virtualization becomes economically meaningful only when it participates in a governable contract—where utilization, isolation, and policy can be expressed, measured, and improved over time.

Strategic Implications for Dynamia

From Dynamia’s perspective (and as VP of Open Source Ecosystem), the strategic value of HAMi becomes clear:

Two-Layer Architecture: Open Source vs Commercial

  • HAMi (CNCF open source project): Responsible for “adoption and trust,” focused on GPU virtualization and compute efficiency
  • Dynamia enterprise products and services: Responsible for “production and scale,” providing commercial distributions and enterprise services built on HAMi
Figure 6: Dynamia Dual Mechanism: Open Source vs Commercial
Figure 6: Dynamia Dual Mechanism: Open Source vs Commercial

This boundary is the foundation for long-term trust—project and company offerings remain separate, with commercial distributions and services built on the open source project.

Global Narrative Strategy

The internal alignment memo recommends a bilingual approach:

First layer: Lead globally with “GPU virtualization / sharing / utilization” (Chinese can directly use “GPU virtualization and heterogeneous scheduling,” but English first layer should avoid “heterogeneous” as a headline)

Second layer: When users discuss mixed GPUs or workload diversity, introduce “heterogeneous” to confirm capability boundaries—never as the opening hook

Core anchor: Maintain “HAMi (project and community) ≠ company products” as the non-negotiable baseline for long-term positioning

The Right Commercialization Landing

DaoCloud’s case study already set vendor-agnostic and CNCF toolchain compatibility as hard constraints, framing vendor dependency reduction as a business and operational benefit—not just a technical detail. Project-HAMi’s official documentation lists “avoid vendor lock” as a core value proposition.

In this context, the right commercialization landing isn’t “closed-source scheduling”—it’s productizing capabilities around real enterprise complexity:

  • Systematic compatibility matrix
  • SLO and tail-latency governance
  • Metering for billing
  • RBAC, quotas, multi-cluster governance
  • Upgrade and rollback safety
  • Faster path-to-production for DRA/CDI and other standardization efforts

Forward View: The Next 2–3 Years

My strong judgment: over the next 2–3 years, GPU scheduling competition will shift from “whose implementation is more black-box” to “whose contract is more open.”

The reasons are practical:

Hardware Form Factors and Supply Chains Are Diversifying

  • OpenAI’s February 12, 2026 “GPT‑5.3‑Codex‑Spark” release emphasizes ultra-low latency serving, including persistent WebSockets and a dedicated serving tier on Cerebras hardware
  • Large-scale GPU-backed financing announcements (for pan-European deployments) illustrate the infrastructure scale and financial engineering surrounding accelerator fleets

These signals suggest that heterogeneity will grow: mixed accelerators, mixed clouds, mixed workload types.

Low-Latency Inference Tiers Will Force Systematic Scheduling

Low-latency inference tiers (beyond just GPUs) will force resource scheduling toward “multi-accelerator, multi-layer cache, multi-class node” architectural design—scheduling must inherently be heterogeneous.

Open Scheduling Is Risk Management, Not Idealism

In this world, “open scheduling” isn’t idealism—it’s risk management. Building schedulable governable “control plane + data plane” combinations around DRA/CDI and other solidifying open interfaces, ones that are pluggable, multi-tenant governable, and co-evolvable with the ecosystem—this looks like the truly sustainable path for AI Native Infrastructure.

The next battleground isn’t “whose scheduling is smarter”—it’s “who can standardize device resource contracts into something governable.”

Conclusion

When you place HAMi 2025 back in the broader AI Native Infrastructure context, it’s no longer just the year of “GPU sharing tools”—it’s a more structural signal: GPUs are moving toward open scheduling.

Figure 7: Open Scheduling Future Vision
Figure 7: Open Scheduling Future Vision

The driving forces come from both ends:

  • Upstream: Standards like DRA/CDI continue to solidify
  • Downstream: Scale and diversity (multi-cloud, multi-model, even accelerators beyond GPUs)

For Dynamia, HAMi’s significance has transcended “GPU sharing tool”: it turns GPU virtualization and slicing into declarative, schedulable, measurable data planes—letting queues, quotas, priorities, and multi-tenancy actually close the governance loop.

Core Model Overview

作者 Jimmy Song
2026年2月10日 21:56

The Yin-Yang - Five Elements - Yun - Qi Model views AI Infrastructure as an organic whole, revealing its operational mechanisms from four dimensions. Each layer focuses on different fundamental questions:

Four-Layer Model

Layer Name Focus Question
Yin-Yang State Layer The system’s internal unity of opposites tension structure, revealing how dual elements like performance vs. constraints, innovation vs. governance coexist
Five Elements Role Layer Five basic role elements in the system and their collaborative relationships, breaking down complex infrastructure into data, models, compute, platforms, and hardware
Yun Time Layer The development stage the system is in and its cyclical patterns, describing the evolution cycle from exploration to platformization, then scaling and rebalancing
Qi Flow Layer The effective “field” of flow within the system, characterizing the conduction and feedback of signals and resources, reflecting the overall smoothness of operation
Table 1: Four-Layer Model

Model Interactions

The four-layer model is not isolated but an interconnected organic whole:

  • The tension of Yin-Yang permeates the dynamic balance of Five Elements
  • The development of Five Elements roles is constrained by their Yun stage
  • The flow of Qi connects the above elements into a self-adaptive cyclic system

The overview diagram below illustrates each layer of the model and their interactions:

Figure 1: AI Infrastructure ‘Yin-Yang - Five Elements - Yun - Qi’ Model Overview. The Yin-Yang layer embodies the system’s internal tension and unity of opposites, the Five Elements layer defines core role elements, the Yun layer describes system stage cycles, and Qi as a flow element permeates and drives the entire system.
Figure 1: AI Infrastructure ‘Yin-Yang - Five Elements - Yun - Qi’ Model Overview. The Yin-Yang layer embodies the system’s internal tension and unity of opposites, the Five Elements layer defines core role elements, the Yun layer describes system stage cycles, and Qi as a flow element permeates and drives the entire system.

Model Application Value

This four-layer model provides a unique perspective for the design, operations, and governance of AI Infrastructure:

  1. Holistic Cognitive Framework: Transcend the limitations of single technical metrics to grasp system state as a whole
  2. Dynamic Balance Thinking: Understand unity of opposites relationships and avoid extremes
  3. Evolutionary Stage Awareness: Grasp the system’s development stage and act in accordance with the situation
  4. Flow Insights: Focus on energy flow within the system to anticipate problems

Next, we will delve into the connotation, engineering mapping, and mechanism of each layer.

The Yin-Yang Layer: Dynamic Balance of System States

作者 Jimmy Song
2026年2月10日 21:56

Yin-Yang is originally a fundamental concept in Chinese philosophy, representing two opposing yet interdependent forces present in all things in the universe. Everything in the world can be classified as either Yin or Yang, and their continuous movement and change generate the various transformations we observe. In the context of systems, Yin-Yang represents the unity of opposites through tension—a pair of attributes or tendencies that pull against yet depend on each other.

Three Typical Pairs of Yin-Yang Tensions

In AI infrastructure, we identify three typical pairs of Yin-Yang tensions:

Expansion ↔ Constraint

Expansion ↔ Constraint: The tension between growth trends and limiting forces.

  • Yang (Expansion): System expansion speed, such as continuously adding tasks and scaling resources
  • Yin (Constraint): Limiting forces, such as cost controls, regulatory constraints, and hardware limits

System expansion speed and constraint intensity always coexist. For example, continuously adding tasks and scaling resources in GPU clusters (the Yang of expansion) is constrained by costs, regulations, or hardware limits (the Yin of constraint).

Imbalance manifestations:

  • Pursuing expansion without regard for constraints → Resource contention and crashes
  • Excessive constraint → Stifling system vitality

Innovation ↔ Governance

Innovation ↔ Governance: The tension between creative capability and control requirements.

  • Yang (Innovation): Technical innovation, introduction of new features
  • Yin (Governance): Security reviews, rule-making

The faster technical innovation progresses, the more easily governance gaps are exposed. For example, introducing new Agent features (innovation, Yang) may outpace security reviews and rule-making (governance, Yin), leading to potential risks.

Imbalance manifestations:

  • Innovation outpaces governance → Potential security risks
  • Excessively strict governance → Slowing innovation momentum

Speed ↔ Stability

Speed ↔ Stability: The tension between performance advancement and reliable operation.

  • Yang (Speed): Performance improvements, increased throughput
  • Yin (Stability): Reliable operation, system stability

When we pursue speed improvements single-mindedly, the cost to stability will eventually manifest. For example, pushing GPU utilization to the limit during model training (speed, Yang) easily leads to more frequent failures or delays (decline in stability, Yin).

Imbalance manifestations:

  • Extreme pursuit of speed → Decline in stability
  • Excessive conservatism → Performance waste

The Art of Yin–Yang Balance

The Yin–Yang poles described above are not simple trade-offs where you choose one and sacrifice the other, but rather inherent relationships of unity of opposites in systems. Both Yin and Yang sides are opposed yet complementary, neither can be dispensed with:

Expansion without constraints is difficult to sustain, constraints without expansion lose meaning

As the ancient saying goes, “One Yin and one Yang constitute the Way” (一阴一阳之谓道). Balancing Yin and Yang is the “Way” of healthy system operation. For architects, the key lies in:

  • Insight into dominant tensions: Determine which pair of tensions is currently dominant
  • Introducing the opposite: Introduce the complementary side at the right time to restore balance
  • Dynamic adjustment: Dynamically transform based on changes in system environment and stage

Practical Cases

Case: GPU Cluster Expansion

When the cluster is in a state of rapid expansion (Yang exuberant, Yin deficient):

  • ✓ Add scheduling policies and resource quotas (supplement Yin)
  • ✓ Establish cost control mechanisms (supplement Yin)
  • ✗ Do not pursue expansion speed single-mindedly

Case: Agent Feature Innovation

When introducing new Agent features:

  • ✓ Simultaneously establish monitoring and sandboxing mechanisms (supplement Yin)
  • ✓ Improve security review processes (supplement Yin)
  • ✗ Do not let innovation outpace governance

Case: Model Training Performance Optimization

When optimizing model training performance:

  • ✓ Simultaneously strengthen fault tolerance mechanisms and testing (supplement the Yin of stability)
  • ✓ Set performance baselines and rollback mechanisms (supplement Yin)
  • ✗ Do not infinitely compress fault tolerance time

Dynamic Transformation of Yin–Yang States

It’s important to note that Yin–Yang states are not static and unchanging, but dynamically transform with system environment and stage.

The same capability may transform from an advantage to a risk at different stages

For example, a “rapid development” strategy that drives rapid iteration during the startup stage, if applied without restraint during the scaling stage, can instead become a major threat to stability.

The analysis of the Yin–Yang layer reminds us to constantly pay attention to the ebb and flow of these opposing forces, and to keep the system in a state of elastic tension through adjustments, rather than snapping or becoming slack and ineffective.

Five Elements Layer: Classification and Collaboration of System Roles

作者 Jimmy Song
2026年2月10日 21:56

Five Elements (Wǔxíng, Five Elements or Five Phases) theory divides everything in the world into five basic elements: Wood, Fire, Earth, Metal, Water. Each element represents a fundamental attribute or functional role, with the five elements generating and overcoming each other in an endless cycle.

In AI infrastructure, we use “Five Elements” to characterize the system’s five core elements and their responsibilities:

Engineering Mapping of Five Elements

Five Elements Symbol Meaning Engineering Correspondence
Water 🌊 Flow and containment Data flow and quality: data pipelines, data assets, and quality control
Wood 🌲 Growth and creation Model growth and capability expansion: model architecture iteration, parameter scale expansion
Fire 🔥 Energy and execution Compute conversion and work efficiency: GPU/TPU computing, job scheduling efficiency
Earth 🏔️ Support and stability Platform support and orchestration governance: distributed coordination, middleware, scheduling systems
Metal ⚙️ Strength and standardization Hardware constraints and physical boundaries: GPU/CPU performance, storage capacity, network bandwidth
Table 1: Engineering Mapping of Five Elements

Water – Data Flow and Quality

Corresponds to data pipelines, data assets, and quality control in the system.

Water symbolizes flow and containment, analogous to the circulation and nourishing role of data in the system, including:

  • Training data acquisition
  • Real-time data input
  • Feedback signal transmission
  • Data cleaning and quality assurance

Wood – Model Growth and Capability Expansion

Corresponds to the evolution and growth of machine learning models and algorithms.

Wood represents growth and creation, mapped to:

  • Model architecture iteration
  • Parameter scale expansion
  • Cultivation of new capabilities
  • Algorithm optimization and improvement

Fire – Compute Conversion and Work Efficiency

Corresponds to computing processes and the utilization of compute resources.

Fire symbolizes energy and execution, reflected as:

  • Using GPU/TPU and other compute resources for calculation
  • Converting electrical energy into model training and inference work
  • Parallel computing capability
  • Job scheduling efficiency

Earth – Platform Support and Orchestration Governance

Corresponds to the support and governance capabilities of the platform layer.

Earth represents support and stability, analogous to:

  • Infrastructure platform support for upper-layer applications
  • Distributed system coordination and orchestration
  • Middleware services
  • Scheduling systems and policy management
  • Permission systems, service quality assurance

Metal – Hardware Constraints and Physical Boundaries

Corresponds to underlying hardware and system hard limits.

Metal represents strength and standardization, mapped to:

  • GPU/CPU hardware performance
  • Storage capacity
  • Network bandwidth
  • Physical conditions and hard rules (power consumption, safety specifications, etc.)

Five Elements Generation Relationships

The Five Elements form a positive cycle through “generation” relationships:

Data (Water) spawns model growth (Wood), model requirements stimulate compute investment (Fire), compute development drives platform thickening (Earth), platform capabilities utilize hardware to push the boundaries of (Metal), and hardware progress in turn supports greater data acquisition (Water)

Figure 1: Five Elements generation relationship diagram. Water generates Wood, Wood generates Fire, Fire generates Earth, Earth generates Metal, Metal generates Water, representing the mutually reinforcing cycle between data, models, compute, platforms, and hardware.
Figure 1: Five Elements generation relationship diagram. Water generates Wood, Wood generates Fire, Fire generates Earth, Earth generates Metal, Metal generates Water, representing the mutually reinforcing cycle between data, models, compute, platforms, and hardware.

Five Elements Overcoming Relationships

At the same time, overcoming relationships also exist among the Five Elements, meaning when one element is too strong or imbalanced, it will suppress or weaken another element:

  • Wood overcomes Earth: Excessive model expansion increases the burden on the platform (Earth), potentially even crushing the existing architecture
  • Earth overcomes Water: Overly heavy platforms and rules will hinder the free flow of data (Water)
  • Water overcomes Fire: Data bottlenecks will limit the performance of compute
  • Fire overcomes Metal: Excessive compute demand may break through hardware (Metal) limits
  • Metal overcomes Wood: Strict hardware and rule limitations will curb the expansion of models (Wood)
Figure 2: Five Elements generation and overcoming relationship diagram. Dashed arrows indicate overcoming relationships, reflecting the system’s internal checks and balances mechanism: any element becoming excessively strong will constrain another element.
Figure 2: Five Elements generation and overcoming relationship diagram. Dashed arrows indicate overcoming relationships, reflecting the system’s internal checks and balances mechanism: any element becoming excessively strong will constrain another element.
Figure 3: Five Elements generation and overcoming relationship diagram. Dashed arrows indicate overcoming relationships, reflecting the system’s internal checks and balances mechanism: any element becoming excessively strong will constrain another element.
Figure 3: Five Elements generation and overcoming relationship diagram. Dashed arrows indicate overcoming relationships, reflecting the system’s internal checks and balances mechanism: any element becoming excessively strong will constrain another element.

Figure 3: Five Elements generation and overcoming relationship diagram. Dashed arrows indicate overcoming relationships, reflecting the system’s internal checks and balances mechanism: any element becoming excessively strong will constrain another element.

Five Elements Balance Diagnosis

Through the Five Elements model, engineering teams can systematically check the role completeness and balance of infrastructure.

Common Imbalance Patterns

Imbalance Pattern Manifestation Consequence Solution
Strong Wood, Weak Water Focus on model algorithm iteration, neglect data quality Model performance hits bottlenecks Strengthen data pipelines and quality control
Strong Metal, Weak Earth Stack hardware, insufficient platform governance capability Poor resource utilization, lack of vitality Improve platform governance and scheduling
Vigorous Fire, Broken Wood Large compute investment, models can’t keep up Resource waste Optimize model architecture, improve compute utilization efficiency
Table 2: Common Imbalance Patterns

Balance Principles

Successful large-scale systems require coordinated cooperation of all five elements

  • Let each of the five elements fulfill its duties in their respective roles
  • Maintain generation as primary, overcoming as secondary
  • Prevent any side from excessive expansion or shrinkage
  • Regularly check the balance state of Five Elements

Only by letting the five elements fulfill their respective roles and mutually promote each other, while preventing any side from excessive expansion or shrinkage, can the entire system maintain robustness and evolutionary capability.

The Yun Layer: Stages and Cycles of System Evolution

作者 Jimmy Song
2026年2月10日 21:56

Yun (运) here refers to the developmental stages and temporal rhythms experienced by a system, which can be understood as the lifecycle cycles or “fortune” of infrastructure.

Large-scale infrastructure is not static but evolves cyclically through the Exploration Period, Platform Period, Scale Period, and Rebalancing Period, with each stage having its primary contradictions and tasks.

Below are the four evolutionary stages.

Exploration Period (Initial Stage)

Characteristics: High variance, low structure, rapid trial and error

At this stage, new technologies and requirements emerge constantly, system architecture is loose, and diverse experiments coexist.

Primary Tasks:

  • Explore effective paths
  • Rapidly validate model and functional directions
  • Collect data and preliminary stability signals

Five Elements Characteristics: Wood and Fire in Command

  • Model innovation (Wood) and computing experimentation (Fire) are core drivers
  • Expansion (Yang) outweighs constraints (Yin)

Architecture Strategy:

  • ✓ Tolerate some chaos
  • ✓ Encourage innovation and iteration
  • ✓ Focus on collecting data and preliminary stability signals
  • ✗ Don’t prematurely introduce heavy processes and restrictions

Platform Period (Growth Stage)

Characteristics: Standardization emerges, interfaces and processes converge

After exploration, the system enters a stage of integration and regulation, beginning to establish unified platforms, standard interfaces, and governance processes, consolidating scattered results into platform capabilities.

Primary Tasks:

  • Establish unified platforms
  • Define standard interfaces
  • Consolidate governance processes

Five Elements Characteristics: Fire Generates Earth

  • Successful practices in computing and functionality (Fire) give rise to platform support requirements (Earth)
  • Governance and standards gradually strengthen

Architecture Strategy:

  • ✓ Extract common requirements
  • ✓ Build support platforms (Yin increases)
  • ✓ Lay the foundation for next-stage scaling
  • ✗ Don’t remain in disordered exploration

Scale Period (Mature Stage)

Characteristics: Efficiency, throughput, and cost become the main battlefield

The system is deployed at scale, and focus shifts to optimizing efficiency and costs, improving throughput and reliability.

Primary Tasks:

  • Optimize efficiency
  • Improve throughput
  • Reduce costs
  • Ensure reliability

Five Elements Characteristics: Heavy Earth Breaks Wood

  • Platforms (Earth) and hard constraints begin to dominate
  • Overly idealistic model expansion (Wood) will encounter setbacks from realistic conditions

Architecture Strategy:

  • ✓ Strengthen monitoring and automated operations
  • ✓ Control overly strong “Yang” through governance means
  • ✓ Ensure robust system operation
  • ✗ Don’t continue with startup-era casual practices

Rebalancing/Substitution Period (Renewal Stage)

Characteristics: Old structures are corrected or replaced by new structures

When the previous stage’s patterns reach their limits, the system either enters self-correction by introducing new elements to rebalance, or gets disrupted and replaced by a new paradigm.

Primary Tasks:

  • Introduce new elements to rebalance
  • Or accept substitution by a new paradigm

Five Elements Characteristics: Metal and Water Rise Again

  • Suppressed hardware/rule innovations (Metal) and new data potentials (Water) rise again
  • Driving system transformation

Architecture Strategy:

  • ✓ Be forward-looking, dare to break through
  • ✓ Transition smoothly, avoid severe volatility
  • ✗ Don’t cling to the status quo

Evolutionary Cycle

The above stages form a cyclical pattern, where the endpoint of each stage is also the starting point of the next .

Figure 1: The “Yun” cycle of AI infrastructure evolution. Systems start from the exploration period, undergo platform period standardization, enter the scale period for efficiency optimization, and ultimately move toward a new cycle of rebalancing or substitution.*
Figure 1: The “Yun” cycle of AI infrastructure evolution. Systems start from the exploration period, undergo platform period standardization, enter the scale period for efficiency optimization, and ultimately move toward a new cycle of rebalancing or substitution.*

The Art of Following the Momentum

A mature infrastructure organization should be able to determine its current stage based on internal and external signals and adjust its strategy accordingly.

If stage transitions are ignored or excessively rushed, the system will experience disturbances or even crises

Error Examples

Erroneous Behavior Manifestation Consequence
Pulling Up Seedlings to Help Them Grow Managing systems still in exploration period as scaled systems, prematurely suppressing change Stifling innovation
Going Against the Momentum Remaining in disordered exploration when it’s time to enter the platform period Missing the window for structured growth and creating hidden risks
Clinging to the Status Quo Unwilling to change when rebalancing period is needed System rigidity and aging
Table 1: Development Stage Characteristics

Stage Assessment Checklist

Through the “Yun” layer perspective, teams can examine the current macro stage:

  • Are we validating new concepts or expanding our achievements?
  • What is the system’s primary contradiction?
  • When might the next stage arrive?
  • Does our strategy align with the current stage?

Example Questions:

  • Are we in the exploration period?
    • If yes → Focus on rapid trial and error and validation
    • If no → Consider whether to enter the platform period
  • Does our system need standardization?
    • If yes → Enter platform period, establish platforms and standards
    • If no → Continue exploration

Qi Layer: Effective System Flow and Pressure Fields

作者 Jimmy Song
2026年2月10日 21:56

Qi (气) in Chinese culture refers to the energy and flow field that permeates all things. In AI infrastructure, we borrow the concept of “Qi” to describe the effective flow and pressure distribution within systems.

This includes the circulation of data, tasks, and signals throughout the system, as well as how various explicit or implicit system pressures accumulate, propagate, and release.

The Essence of Qi: Overall State of Affairs

Unlike traditional single-point metric monitoring, the concept of “Qi” reminds us to focus on the overall state of affairs:

Signals are not isolated events, but rather gather and flow like a field

For example:

  • A sudden spike in GPU utilization may not be abnormal
  • But if multiple metrics (job queue length, response latency, memory usage, etc.) show a simultaneous trend of increase and persistence → this indicates a change in the “Qi field”
  • This signals the system entering a high-pressure state

This signal field manifests as the gathering and stretching of Qi, indicating the accumulation of some form of system tension.

Two States of Qi

Qi Flow: System Active

When all elements coordinate well, data and instructions flow smoothly, producing value efficiently:

  • Processing rates across all stages are basically matched
  • No long-term backlogs or idle resources
  • Timely system responses
  • Balanced resource utilization

Qi Stagnation: System Pathological

If a bottleneck or imbalance occurs somewhere, Qi’s flow is obstructed, causing local pressure to surge:

  • Jobs queue for long periods
  • CPU/GPU long-term idle or 100% utilization
  • Serious message queue backlog
  • Frequent anomaly alerts

Ultimately, this may trigger failures or performance collapse at weak points.

Qi’s Flow Path

To intuitively understand Qi’s flow path, we can view the system as a closely connected network:

Figure 1: Diagram of system ‘Qi’ flow path. Data (Water) Qi enters Model (Wood), triggering Computing Power (Fire) operation, coordinated via Platform (Earth), executed on Hardware (Metal), producing results that feed back to the data layer, forming a closed loop.
Figure 1: Diagram of system ‘Qi’ flow path. Data (Water) Qi enters Model (Wood), triggering Computing Power (Fire) operation, coordinated via Platform (Earth), executed on Hardware (Metal), producing results that feed back to the data layer, forming a closed loop.

Qi’s Cycle:

  • Data (Water) Qi enters Model (Wood)
  • Drives Computing Power (Fire) to operate
  • Coordinated via Platform (Earth)
  • Executes computation on Hardware (Metal)
  • Outputs results, producing new data or signals
  • Feeds back into the data pool (Water)
  • Cycle repeats

Two Forms of Qi

Healthy Flow

Qi circulates ceaselessly among the five elements, maintaining system functionality:

  • If every step flows smoothly → system operates smoothly
  • If any step is obstructed → Qi flow slows or even reverses, damaging system performance and stability

Pressure Propagation

Qi refers not only to healthy flow, but also to pressure propagation:

Example: Data Inflow Surge

  • Data inflow surges but model processing capacity cannot keep up
  • Unprocessed data continuously accumulates
  • Manifests as excessive pressure in the data layer (Water)
  • Leading to suppression of computing power performance (Fire weakens)

Example: Hardware Resource Exhaustion

  • Hardware (Metal) resources exhausted
  • Computing requests cannot be satisfied
  • Obstructed Qi transforms into queuing pressure
  • Feeds back to platform (Earth) scheduling layer and user experience

Application of Qi Layer in Operations

Through the lens of “Qi”, operations and architecture teams can more sensitively detect sub-optimal system states:

Not Just Whether There’s a Problem, But How It’s Trending

Qi State Manifestation Warning Significance
Stagnation Emerging Latency jitter gradually worsening System entering sub-stable state, needs 疏导
Flow Obstruction Request failure rate rising, retries increasing 某环节阻塞,needs investigation
Qi Scattering Metrics fluctuating severely, irregular System severely imbalanced, needs overall adjustment
Qi Deficiency Resource utilization long-term low Configuration unreasonable, needs optimization
Table 1: Qi State and Warning Significance

Qi Disorder Precedes Major Incidents

  • Latency jitter gradually worsening → signals system entering sub-stable state
  • If no measures are taken to resolve (scaling resources, optimizing algorithms, or rate limiting) → may evolve to complete failure
  • Agent task interaction rhythm (Qi) slows or stops → may indicate poor communication between agents or deadlock

Strategies for Guiding Qi Flow

Maintaining smooth Qi flow requires building resilience:

Architecture Level

  • Peak shaving and valley filling mechanisms: Absorb 突发流量
  • Message queue backpressure protection: Prevent pressure backflow
  • Elastic buffer design: Reserve margin to handle impacts

Strategy Level

  • Slack capacity: Maintain certain redundancy
  • Elastic scaling strategies: Dynamically adjust resources
  • Rate limiting and degradation mechanisms: Protect core functionality

Agent System Special Attention

  • Monitor task queues and communication latency
  • Ensure information flow (Qi) between agents is unobstructed
  • Introduce coordinator agents or reduce concurrency when necessary to smooth Qi flow

Qi Layer Monitoring Practices

Establish system-wide observability:

Monitoring Dimension Focus Tool Examples
Traffic Distribution Request flow across stages Distributed Tracing
Queue Backlog Queue length trends Message Queue Monitoring
Resource Utilization CPU/GPU/Memory/Storage Prometheus + Grafana
Latency Distribution P50/P95/P99 latency APM Tools
Anomaly Trends Error rate, retry rate changes Log Aggregation Analysis
Table 2: Qi Layer Monitoring Dimensions

The Qi layer provides an effective liquidity metric, helping us pulse-check whether the system’s “blood and Qi” are abundant and flowing smoothly

Summary

Qi’s operation can be understood as whether the system’s “meridians” are unobstructed:

  • Qi flow means system active: Data and instructions flow smoothly, producing value efficiently
  • Qi stagnation means system pathological: Flow obstructed, local pressure surges, ultimately triggering failures

Just as in Traditional Chinese Medicine’s four examination methods, by observing “Qi’s” operation, we can predict the trajectory of system problems and apply targeted remedies.

System Diagnosis Principles: Criteria for Health Status

作者 Jimmy Song
2026年2月10日 21:56

To maintain the long-term healthy evolution of AI infrastructure, post-mortem summaries are far from sufficient. We need a set of system diagnosis principles to detect hidden risks early and correct deviations.

Based on the Yin-Yang Five Elements Yun model, diagnosis can be conducted from the following five dimensions:

Five-Dimensional Diagnosis Framework

Figure 1: Five-Dimensional Diagnosis Framework Diagram
Figure 1: Five-Dimensional Diagnosis Framework Diagram

Five Elements Balance Check

Assess the current status of five aspects: Data (Water), Models (Wood), Compute (Fire), Platform (Earth), and Hardware (Metal).

Diagnosis Method

Checklist:

  • Can data pipelines keep up with demands? (Water)
  • Are model capabilities fully utilized? (Wood)
  • Are compute resources effectively used? (Fire)
  • Can the platform support current load? (Earth)
  • Is hardware becoming a bottleneck? (Metal)

Identify Problems

Problem Type Manifestation Solution
Short Board One element significantly weaker than others Prioritize strengthening that element
Overload One element consumes excessive resources or frequently becomes a bottleneck Introduce limits or expand other elements to share pressure
Table 1: Problem Types and Solutions

Typical Symptoms

  • Water Level Too Low: Data pipelines always lag behind training needs → Replenish data processing capacity
  • Metal Overload: Hardware often runs at full capacity or even triggers limit alarms → Expand capacity or impose constraints on upper layers

Most failures do not stem from missing components, but from long-term role imbalance

Qi Flow Smoothness Check

Analyze whether Qi flows smoothly through the system via full-link monitoring.

Diagnosis Method

Key Metrics:

  • Latency distribution of key processes
  • Queue backlogs
  • Resource utilization curves

Qi Smooth vs. Qi Not Smooth

State Characteristics
Qi Smooth Processing rates across stages basically match, without long-term backlogs or idle resources
Qi Not Smooth One stage remains a bottleneck for long periods, or large amounts of resources sit idle
Table 2: Qi Flow: Smooth vs Obstructed

Diagnosis Points

Distinguish temporary fluctuations from persistent trends: brief peaks don’t necessarily indicate Qi blockage, but persistent deviations must be addressed

Tool Support:

  • Dashboards and automated alerts
  • Timely capture of “stagnant Qi” locations
  • Further investigation of causes (which Five Elements imbalance corresponds)

Yin-Yang Dynamics Check

Assess whether current strategy and state are Yang Excess Yin Deficiency or Yin Excess Yang Deficiency.

Diagnosis Method

Qualitative Analysis:

  • Look at whether recent architecture decisions overly favor one extreme
  • Have you been continuously expanding and adding new features while ignoring stability?
  • Or conversely, multiple layers of approval and strict constraints but lack innovation momentum?

Quantitative Metrics:

Metric Yang Excess Yin Excess
Change Frequency Extremely high Extremely low
Incident Rate Frequent Extremely low but no change
Release Rhythm Continuous Long-term stagnation
Table 3: Yin-Yang Status

Balance Strategy

State Symptoms Solution
Yang Excess Yin Deficiency Frequent changes with frequent incidents Pause releases, focus on addressing hazards (replenish Yin)
Yin Excess Yang Deficiency Long-term no change and stagnation Introduce challenges and innovation (add Yang)
Table 4: Balance Strategies

Yun Alignment Check

Determine whether the organization’s actions match the system’s current stage, preventing counter-Yun operation.

Diagnosis Method

Combine Business Development and Technical Maturity:

Error Pattern Manifestation Consequences
Premature Standardization Spending 大量精力 on process management and cost optimization for emerging projects These are typically scale stage concerns, but the project is still in exploration stage
Counter-Yun Exploration Frequently changing underlying architecture for widely used platforms without rigorous testing Inconsistent with scaling stage
Table 5: Error Patterns

Stage-Strategy Reference Table

Stage Should Focus On Should Not Do
Exploration Stage Diversity, flexibility, rapid trial and error Premature pursuit of efficiency
Platform Stage Standardization, process norms Frequent arbitrary changes
Scale Stage Optimization, stability, efficiency Still growing wildly
Rebalancing Stage Transformation, breakthrough, innovation Clinging to the past
Table 6: Stage-Strategy Mapping

Checklist:

  • Which stage are we currently in?
  • Do our actions match the stage?
  • Do we need to adjust strategy?

When discovering actions don’t match the stage, immediately adjust strategy to avoid working at cross-purposes

Yang Runaway Warning

Pay special attention to whether there are signs of Yang state runaway in the system.

What is Yang Runaway?

Exponential explosion or collapse risk caused by unconstrained positive feedback.

Typical Scenarios

Scenario Mechanism Risk
Service Call Volume Surge Bug or abuse → Resource strain → Queuing and retry storms → Further increase in calls Resource exhaustion
Training Task Self-Replication Tasks unlimitedly self-replicate to accelerate → Cluster resource exhaustion System collapse
Table 7: Typical Scenarios

Diagnosis Signals

  • A metric shows exponential explosive growth
  • Lack of slowing mechanisms
  • Formation of vicious cycles

Response Strategy

Strategy Means Effect
Establish Hard Limits Metal’s constraints Immediate shutdown
Introduce Negative Feedback Earth’s governance (rate limiting, quotas) Braking and deceleration
Break Positive Feedback Chain Activate emergency plan Pull back to steady state
Table 8: Response Strategies

When discovering a metric showing exponential explosive growth without slowing mechanisms, intervene immediately

Diagnosis Implementation Process

Regular Diagnosis Mechanism

Recommend establishing a periodic diagnosis process:

Figure 2: Regular Diagnosis Mechanism Flowchart
Figure 2: Regular Diagnosis Mechanism Flowchart

Diagnosis Meeting Agenda

Fixed Session of Weekly Operations Review Meeting:

  • Check Five Elements scores for each module
  • Browse global Qi flow diagram
  • Analyze Yin-Yang dynamics
  • Discuss current Yun

This systematic examination makes hidden risks 无处遁形,thus achieving prevention before problems occur

Diagnosis Action Matrix

Diagnosis Result Action Recommendation
Five Elements: One Element Too Weak Concentrate resources to strengthen the weakness
Five Elements: One Element Overloaded Expand capacity or introduce constraints
Qi Stagnation at One Stage Clear bottlenecks, optimize processes
Yang Excess Yin Deficiency Strengthen governance and stability mechanisms
Yin Excess Yang Deficiency Activate innovation and boost vitality
Counter-Yun Operation Adjust strategy and go with the flow
Yang Runaway Warning Immediate intervention, break positive feedback
Table 9: Diagnosis Action Matrix

Summary

Through the above diagnosis principles, architects and operations teams can periodically take the pulse of infrastructure like TCM pulse diagnosis.

When diagnosis indicates imbalance in some aspect, immediately prescribe remedy based on the theory: replenish what needs replenishing, purge what needs purging.

Long-term adherence will keep the system on a healthy evolutionary trajectory.

Conclusion and Outlook

作者 Jimmy Song
2026年2月10日 21:56

This paper systematically presents the four-layer model of “Yin-Yang - Five Elements - Yun - Qi” for AI infrastructure, providing a comprehensive cognitive map from theory to practice.

Review of Theoretical Model

Through four dimensions, we have constructed a global framework for understanding AI infrastructure:

Layer Core Value Key Insights
Yin-Yang Understanding the tension and balance within systems Expansion and constraint, innovation and governance, speed and stability—these three are opposites yet unified, all indispensable
Five Elements Organizing the fundamental role elements of systems Data, models, computing power, platforms, hardware—these five generate and restrain each other in endless cycles
Yun Grasping the periodic patterns of system evolution Exploration phase, platform phase, scale phase, rebalancing phase—act in accordance with the trends
Qi Insight into the flow state of system operation When Qi flows, the system is active; when Qi stagnates, the system becomes pathological
Table 1: Layer, Core Value, and Key Insights

More importantly, we have demonstrated how this theory combining Eastern wisdom with engineering practice can provide insights and guidance for real-world problems such as GPU scheduling, Agent Runtime, and platform governance.

Core Value of the Model

Holistic View

Traditional fragmented perspectives often see trees but not the forest, making it difficult to provide timely warnings of systemic risks

The Yin-Yang Five Elements Qi-Yun model, with its holistic view, helps architects:

  • Break free from the constraints of pure technical metrics
  • Grasp the principal contradictions and driving forces of system evolution
  • Extract meaningful patterns from complex signals

Dynamic View

The value of a system lies not in pursuing the extreme of a single performance indicator without limit, but in balancing all elements to achieve long-term coordinated development

The model’s dynamic view reminds us:

  • Yin-Yang dynamics transform dynamically with environment and stage
  • The same capability may shift from advantage to risk at different stages
  • Strategies need timely adjustment as Yun changes

Balance View

The core philosophy of the model is balance rather than extreme:

  • Not pursuing the limit of a single metric
  • But pursuing system coordination and sustainability
  • Finding dynamic balance points within unity of opposites

Practical Application Value

During Architecture Design

  • Consider the completeness and balance of the Five Elements
  • Reserve Yin-Yang constraint mechanisms
  • Design evolution paths that align with Yun trends
  • Plan channels for Qi flow

During Operations and Governance

  • Regularly check Five Elements balance
  • Monitor Qi circulation status
  • Assess Yin-Yang dynamic changes
  • Determine Yun phase transitions
  • Provide early warning of Yang loss-of-control risks

During Decision Review

  • Analyze root causes from the four-layer model perspective
  • Check whether basic principles of any layer were violated
  • Develop systematic solutions
  • Establish long-term improvement mechanisms

Insights for Architects

In an era of flourishing large models and autonomous agents, infrastructure has become unprecedentedly complex and active.

Cognitive Upgrade

From “managing machines and applications” to “managing intelligence and knowledge”:

  • Not only focus on application logic itself
  • But more on how knowledge and intelligence integrate into systems
  • View models as dynamically evolving components

Mindset Shift

From single-metric optimization to system balance:

  • Not pursuing the extreme of a single element
  • But pursuing overall coordination and sustainability
  • Finding dynamic balance within unity of opposites

Capability Development

From technical expert to systems philosopher:

  • While mastering technical tools
  • Cultivate systems thinking and philosophical reflection
  • Apply holistic frameworks like Yin-Yang and Five Elements

Limitations of the Model

It must be noted that this theory is not a panacea:

Not a Rigid Formula

Its value lies not in providing a rigid formula, but in guiding us to return to reality and think about problems from a more comprehensive perspective

  • The model provides a thinking framework, not standard answers
  • Specific applications need to consider actual scenarios
  • Architects ultimately must make judgments based on specific context

Requires Continuous Validation

  • Theory needs continuous validation and refinement in practice
  • Different scenarios may require adjustment and extension
  • Feedback and improvement in practice are encouraged

Supplement, Not Replace

  • The model is a tool to assist decision-making
  • Cannot replace professional judgment and experience
  • Should be used in combination with other methodologies

Future Outlook

Theory Development

This model has significant room for development:

  • Quantitative Metrics: Develop more precise quantitative indicators to make the theory more actionable
  • Tool Support: Develop analysis tools and automated diagnostic systems based on the model
  • Case Accumulation: Collect more practical cases to validate and enrich the theory
  • Cross-Domain Application: Explore applications of the model in other complex system domains

Practice Promotion

We hope this framework can help:

  • CTOs, infrastructure architects, and platform R&D teams
  • When facing increasingly complex AI infrastructure
  • Make wiser decisions

Ultimate Vision

Standing with sword in the midst of waves of change, embracing both the Yang of innovation and the Yin of governance, riding the system’s Qi above the currents

Conclusion

AI infrastructure stands at the starting point of a new era. We need not only technological innovation but also conceptual innovation.

The Yin-Yang Five Elements Qi-Yun model offers a unique perspective—combining Eastern philosophical wisdom with modern engineering practice—helping us find simplicity in complexity, stability in change, and unity in opposition.

We hope this model becomes a powerful tool for your thinking about AI infrastructure, helping you find your own “Way” in the balance and evolution of systems.

Dynamic Relationship Modeling: Five Elements Flow Under Yin-Yang Balance

作者 Jimmy Song
2026年2月10日 21:55

Yin-Yang × Five Elements: Intrinsic Tension of Elements

Each Five Elements component contains both Yin and Yang aspects, manifesting with different polarities in different contexts:

Figure 1: Yin-Yang states of Five Elements. Each element includes Yin (potential, static, introverted) and Yang (explicit, dynamic, extroverted) aspects, with transformation possible between them depending on context.
Figure 1: Yin-Yang states of Five Elements. Each element includes Yin (potential, static, introverted) and Yang (explicit, dynamic, extroverted) aspects, with transformation possible between them depending on context.

Yin-Yang Attributes of the Five Elements:

Five Elements Yin State Yang State
Water (Data) Potential data reserves, implicit patterns (static storage of historical data) Instant data flow, real-time feedback
Wood (Model) Dormant capabilities (unactivated parameters, backup algorithms) Explicit expansion (model architecture updates, parameter surge)
Fire (Compute) Stored energy (idle compute, waiting for scheduling) High-load operation
Earth (Platform) Static support (stable operation, non-intervention) Proactive scheduling and expanded governance
Metal (Hardware) Implicit constraints (unused capacity) Explicit limits (resource hard caps maxed out)
Table 1: Dynamic Model Overview

Signs of Yin-Yang Imbalance:

  • Fire Excessively Yin: GPU compute idle for long periods while tasks backlog → Poor scheduling
  • Fire Excessively Yang: GPUs at 24-hour full load with no elasticity → Hidden crash risk
  • Earth Excessively Yang: Too many platform rules → Stifling innovation
  • Earth Excessively Yin: Lack of platform control → Leading to chaos

Five Elements × Qi: Dynamic Network of Flow

The Five Elements framework provides tools to decompose systems, but system components are not static puzzles—rather, they connect into a dynamic network through the flow of Qi.

  • Generating Relationships: Qi flows smoothly, forming positive feedback loops
  • Controlling Relationships: Qi stagnates at certain links or reverse effects strengthen

Dynamic Relationship Principles:

Generating primarily, Controlling secondarily—main energy flows transmit successfully through each link, while balancing forces intervene moderately only to prevent extreme situations.

Yun × Yin-Yang Five Elements: Boundary Conditions for Stage Evolution

The stage-based nature of Yun provides a perspective of boundary conditions evolving over time for the aforementioned Yin-Yang Five Elements dynamics.

Each stage strengthens or weakens certain elements and tensions:

Stage Main Characteristics Five Elements Characteristics Yin-Yang Characteristics
Exploration Stage High variance, low structure, rapid trial and error Wood and Fire dominant Expansion (Yang) outweighs Constraints (Yin)
Platform Stage Standardization emerges, interfaces and processes converge Fire generates Earth Governance (Yin increasing) gradually strengthens
Scale Stage Efficiency, throughput, cost become main battlegrounds Earth dominates Wood Stability (Yin) takes precedence
Rebalancing Stage Old structures corrected or replaced by new structures Metal and Water resurge Transformation (Yang) rises again
Table 2: Typical Interaction Scenarios

Dynamic Stage Transitions:

The Yun layer tells us when to shift focus:

  • As stages change, the system needs to “allocate interests”
  • Previously dominant elements may become excessive and need convergence
  • Previously minor elements need strengthening to address shortcomings

Examples:

  • In Platform Stage/Scale Stage → Must strengthen governance (Earth’s Yang) and hardware optimization (Metal’s Yang)
  • To curb the 野蛮 growth tendencies left over from early stages (excessive Wood-Fire Qi)
  • In Rebalancing Stage → May need to reactivate suppressed innovation potential (Water-Wood Qi)

Comprehensive Analysis Case: GPU Scheduling Scenario

Let’s see how to apply the four-layer model to analyze a real GPU scheduling problem.

Problem Scenario: Cluster experiences task queues under high load

Layer Diagnosis Findings
Qi Layer Observe Qi flow state Compute Fire Qi is obstructed
Five Elements Layer Locate elements Data input too intense (Water Yang excessive) but scheduling (Platform Earth) strategy cannot keep up
Yin-Yang Layer Analyze tensions Scheduling strategy blindly pursues maximizing utilization (excessively Yang) while lacking elastic buffers (Yin)
Yun Layer Assess stage This is an emerging business that just passed exploration stage and hasn’t perfected scheduling—Platform Stage
Table 3: Four-Layer Diagnostic Analysis

Solutions

Based on four-layer collaborative diagnosis, develop comprehensive solutions:

  • Qi Layer: Unblock Qi flow

    • Expand resources or optimize algorithms
  • Five Elements Layer: Balance elements

    • Strengthen platform scheduling capabilities (Earth)
  • Yin-Yang Layer: Restore balance

    • Introduce elastic buffer mechanisms (supplement Yin)
    • Avoid blindly pursuing high utilization
  • Yun Layer: Follow the trend

    • Accelerate introduction of standardized scheduling and resource governance (Earth’s Yun is approaching)

Value of Dynamic Modeling

Through the multi-level dynamic modeling above, we can:

  • Explain complex scenarios more comprehensively: No longer limited to single perspectives
  • Locate root causes of problems: Find fundamental causes rather than surface phenomena
  • Point improvement directions: Obtain systematic solutions
  • Predict system evolution: Prepare in advance for stage transitions

Practical Recommendations

In daily architecture design and operations, you can establish these thinking habits:

  • When encountering problems: Analyze layer by layer from a four-layer perspective
  • When making decisions: Consider impacts on all four layers
  • When conducting post-mortems: Check whether warning signals from the four-layer model were ignored

The value of a system lies not in pursuing the extreme of a single performance indicator without limit, but in balancing all elements to achieve long-term coordinated development

Engineering Practice Guide: Architecture Decisions Guided by Theory

作者 Jimmy Song
2026年2月10日 21:55

The theoretical models mentioned above are not 停留在停留在 the conceptual level, but directly provide guidance for the engineering practice of AI infrastructure. In specific scenarios such as GPU scheduling, Agent runtime, and platform governance, we can follow the principles below to apply the Yin-Yang Five Elements Qi Movement model.

Balance Yin and Yang, Avoid Extremes

Consider both propelling forces and restraining forces when making architecture decisions.

GPU Cluster Scaling:

  • ✓ Satisfy business growth (expanding Yang)
  • ✓ Set quota and priority policies (constraining Yin)
  • ✓ Prevent resource abuse

Agent Runtime Design:

  • ✓ Give agents more autonomy (innovation, Yang)
  • ✓ Introduce monitoring and sandboxing mechanisms (governance, Yin)
  • ✓ Prevent loss of control

Practice Checklist:

After every major adjustment, ask yourself: Have I introduced corresponding counter-forces to stabilize the system?

Complete the Five Elements, Identify and Fill Weaknesses

Regularly review whether the five types of elements in the system are balanced.

GPU Infrastructure Check:

  • Do data pipelines keep up with computing power improvements? (Water and Fire matching)
  • Does model optimization fully utilize hardware? (Wood and Metal matching)
  • Can the scheduling platform handle peak loads? (Earth supporting Fire)
  • Has hardware resources become a bottleneck? (Metal not holding back)

Agent Platform Check:

  • Is there high-quality knowledge base or real-time data support? (Water)
  • Is there strong model capability? (Wood)
  • Is there sufficient computing resources? (Fire)
  • Is there a good orchestration framework? (Earth)
  • Is there a reliable environment and interfaces? (Metal)

Practice Strategy:

Once a bottleneck or overload is discovered in a certain link, decisively invest resources to fill the weakness or reduce the burden on the overloaded part

Problem Discovered Solution
Insufficient data quality (“Water” weak) Prioritize data governance
Long-term low hardware utilization (Metal strong, Fire weak) Optimize algorithms or scheduling to better utilize hardware
Table 1: Problem Discovery and Solutions

Follow the Trend, Align with the Movement

Develop reasonable strategies based on the stage of the system.

Strategies for Different Stages:

Stage Should Do Should Not Do
Exploration Phase Rapid trial and error, validate value Prematurely introduce heavy processes and constraints
Platform Phase Standardized management, MLOps tools Remain in disordered exploration
Scale Phase Strengthen governance and efficiency optimization Still use the casual practices of the startup period
Rebalancing Phase Architecture innovation, introduce new technologies Refuse to move forward
Table 2: Strategies for Different Stages

Regular Assessment: At each quarter or important milestone, assess:

  • Which stage are we currently in?
  • What is the main contradiction in this stage?
  • When might the next stage arrive?
  • Prepare in advance for the transition

Practice Cases:

  • An AI training cluster after validating the concept → Should consider entering standardized management (transitioning from exploration phase to platform phase)
  • When system scale expansion encounters bottlenecks → Consider whether to enter the rebalancing phase and break through through architecture innovation

Observe Qi Field, Optimize Flow

Establish global observability of the system, focusing on trends and correlations rather than single-point metrics.

Monitoring Methods:

  • Distributed tracing
  • Metric correlation analysis
  • Full-link monitoring

Signals of Qi Disorder:

Signal Possible Cause
Frequent occurrence of various abnormal logs Global investigation needed
A metric’s periodic fluctuations becoming increasingly intense The system may be approaching a limit internally
Table 3: Signals of Qi Disorder

Strategies to Keep Qi Flowing Smoothly:

Architecture Level:

  • Peak clipping and valley filling mechanisms
  • Message queue backpressure protection

Strategy Level:

  • Slack capacity
  • Elastic scaling strategies

Agent System Special Attention:

  • Monitor task queues and communication latency
  • Ensure smooth information flow (Qi) between agents
  • Introduce coordinator agents or reduce concurrency when necessary

Dynamic Adjustment, Continuous Rebalancing

Integrate the Yin-Yang Five Elements Qi Movement model into the team’s continuous improvement process.

Core Questions in Architecture Reviews or Incident Retrospectives:

  • Is the current main contradiction more inclined toward expansion or constraint, speed or stability?
  • Is any Five Elements element overloaded (Yang excess) or missing (Yin deficiency)?
  • Is System Qi congested somewhere?
  • Do our strategies align with the current stage?

Continuous Improvement Process:

Problem Discovery → Four-Layer Model Diagnosis → Strategy Formulation → Implementation Adjustment → Effect Evaluation → Continuous Optimization

Practice Case: Large-Scale GPU Training Cluster Optimization

Background: A team encountered stability issues while operating a large-scale GPU training cluster.

Four-Layer Model Diagnosis:

Layer Diagnosis Findings
Yin-Yang Layer Speed vs Stability Continuously compressing fault tolerance and testing time in pursuit of efficiency (speed Yang), leading to frequent online failures (stability Yin damaged)
Five Elements Layer Five Elements Check Data pipeline latency gradually increasing (Water weaker than Fire)
Movement Layer Stage Judgment System has moved from barbaric growth period to maturity period
Qi Layer Qi Flow State Qi stagnation phenomenon obvious
Table 4: Monitoring Methods

Comprehensive Solution:

  • Yin-Yang Balance:

    • Suspend performance optimization
    • Invest time to strengthen fault tolerance mechanisms and testing (supplement stability Yin)
  • Five Elements Completion:

    • Add data preprocessing nodes and caching (strengthen Water)
  • Movement Adjustment:

    • Change mindset, shift focus from feature expansion to optimization and governance
  • Qi Flow Regulation:

    • Build full-link tracing system
    • Monitor the time of each link from training job submission to completion
    • Identify Qi stagnation points and clear them

Result: While maintaining high utilization, the cluster’s stability was greatly improved, and no serious downtime occurred again.

Scenario Application Quick Reference Table

Scenario Yin-Yang Focus Five Elements Check Movement Judgment Qi Flow Monitoring
GPU Scheduling Utilization vs Elasticity Fire - Earth - Metal Balance Scale Phase Efficiency Optimization Task queues, resource utilization curves
Agent Runtime Autonomy vs Governance Water - Wood - Fire Coordination Exploration Phase Rapid Iteration Communication latency, task interaction rhythm
Platform Governance Innovation Risk Control vs Process Efficiency Earth - Metal Constraints Platform Phase Standardization Rule execution rate, change frequency
Cost Optimization Performance vs Cost Fire - Metal Matching Scale Phase Refinement Resource waste, idle time
Table 5: Signals of Qi Disorder

Summary

Through the Yin-Yang Five Elements Qi Movement model, we can in practice:

  • Avoid Extremes: Not blindly pursuing single metrics
  • Systematic Thinking: Analyzing problems from multiple dimensions
  • Follow the Trend: Adjust strategies based on stages
  • Predict Problems: Early warning of risks through Qi field changes
  • Continuous Improvement: Establish systematic optimization processes

The value of this system lies in: combining Eastern wisdom with engineering practice to provide a unique and effective thinking framework for complex AI infrastructure

AI Learning Resources: 44 Curated Collections from Our Cleanup

作者 Jimmy Song
2026年2月8日 20:20

“The best way to learn AI is to start building. These resources will guide your journey.”

Figure 1: AI Learning Resources Collection
Figure 1: AI Learning Resources Collection

In my ongoing effort to keep the AI Resources list focused on production-ready tools and frameworks, I’ve removed 44 collection-type projects—courses, tutorials, awesome lists, and cookbooks.

These resources aren’t gone—they’ve been moved here. This post is a curated collection of those educational materials, organized by type and topic. Whether you’re a complete beginner or an experienced practitioner, you’ll find something valuable here.

Why Remove Collections from AI Resources?

My AI Resources list now focuses on concrete tools and frameworks—projects you can directly use in production. Collections, while valuable, serve a different purpose: education and discovery.

By separating them, I:

  • Keep the resources list actionable and focused
  • Create a dedicated space for learning materials
  • Make it easier to find what you need

📚 Awesome Lists (14 Collections)

Awesome lists are community-curated collections of the best resources. They’re perfect for discovering new tools and staying updated.

Must-Explore Awesome Lists

Awesome Generative AI

  • Models, tools, tutorials, and research papers
  • Great for: Comprehensive overview of generative AI landscape

Awesome LLM

  • LLM resources: papers, tools, datasets, applications
  • Great for: Deep dive into large language models

Awesome AI Apps

  • Practical LLM applications, RAG examples, agent implementations
  • Great for: Real-world implementation examples

Awesome Claude Code

  • Claude Code commands, files, and workflows
  • Great for: Maximizing Claude Code productivity

Awesome MCP Servers

  • MCP servers for modular AI backend systems
  • Great for: Building with Model Context Protocol

Specialized Awesome Lists


🎓 Courses & Tutorials (9 Curricula)

Structured learning paths from universities and tech companies.

Microsoft’s AI Curriculum

AI for Beginners

  • 12 weeks, 24 lessons covering neural networks, deep learning, CV, NLP
  • Great for: Complete AI foundation
  • Format: Lessons, quizzes, projects

Machine Learning for Beginners

  • 12-week, 26-lesson curriculum on classic ML
  • Great for: ML fundamentals without deep math
  • Format: Project-based exercises

Generative AI for Beginners

  • 18 lessons on building GenAI applications
  • Great for: Practical GenAI development
  • Format: Hands-on projects

AI Agents for Beginners

  • 11 lessons on agent systems
  • Great for: Understanding autonomous agents
  • Format: Project-driven learning

EdgeAI for Beginners

  • Optimization, deployment, and real-world Edge AI
  • Great for: On-device AI applications
  • Format: Practical tutorials

MCP for Beginners

  • Model Context Protocol curriculum
  • Great for: Building with MCP
  • Format: Cross-language examples and labs

Official Platform Courses

Hugging Face Learn Center

  • Free courses on LLMs, deep RL, CV, audio
  • Great for: Hands-on Hugging Face ecosystem
  • Format: Interactive notebooks

OpenAI Cookbook

  • Runnable examples using OpenAI API
  • Great for: OpenAI API best practices
  • Format: Code examples and guides

PyTorch Tutorials

  • Basics to advanced deep learning
  • Great for: PyTorch mastery
  • Format: Comprehensive tutorials

🍳 Cookbooks & Example Collections (5 Collections)

Practical code examples and recipes.

Claude Cookbooks

  • Notebooks and examples for building with Claude
  • Great for: Anthropic Claude integration
  • Format: Jupyter notebooks

Hugging Face Cookbook

  • Practical AI cookbook with Jupyter notebooks
  • Great for: Open models and tools
  • Format: Hands-on examples

Tinker Cookbook

  • Training and fine-tuning examples
  • Great for: Fine-tuning workflows
  • Format: Platform-specific recipes

E2B Cookbook

  • Examples for building LLM apps
  • Great for: LLM application development
  • Format: Recipes and tutorials

arXiv Paper Curator

  • 6-week course on RAG systems
  • Great for: Production-ready RAG
  • Format: Project-based learning

📖 Guides & Handbooks (5 Resources)

In-depth guides on specific topics.

Prompt Engineering Guide

  • Comprehensive prompt engineering resources
  • Great for: Mastering prompt design
  • Format: Guides, papers, lectures, notebooks

Evaluation Guidebook

  • LLM evaluation best practices from Hugging Face
  • Great for: Assessing LLM performance
  • Format: Practical guide

Context Engineering

  • Design and optimize context beyond prompt engineering
  • Great for: Advanced context management
  • Format: Practical handbook

Context Engineering Intro

  • Template and guide for context engineering
  • Great for: Providing project context to AI assistants
  • Format: Template + guide

Vibe-Coding Workflow

  • 5-step prompt template for building MVPs with LLMs
  • Great for: Rapid prototyping with AI
  • Format: Workflow template

🗂️ Template & Workflow Collections

Reusable templates and workflows.

Claude Code Templates

  • Code templates for various programming scenarios
  • Great for: Claude AI development
  • Format: Template collection

n8n Workflows

  • 2,000+ professionally organized n8n workflows
  • Great for: Workflow automation
  • Format: Searchable catalog

N8N Workflows Catalog

  • Community-driven reusable workflow templates
  • Great for: Workflow import and versioning
  • Format: Template catalog

📊 Research & Evaluation

Academic and evaluation resources.

LLMSys PaperList

  • Curated list of LLM systems papers
  • Great for: Research on training, inference, serving
  • Format: Paper collection

Free LLM API Resources

  • LLM providers with free/trial API access
  • Great for: Experimentation without cost
  • Format: Provider list

🎨 Other Notable Resources

System Prompts and Models of AI Tools

  • Community-curated collection of system prompts and AI tool examples
  • Great for: Prompt and agent engineering
  • Format: Resource collection

ML Course CS-433

  • EPFL Machine Learning Course
  • Great for: Academic ML foundation
  • Format: Lectures, labs, projects

Machine Learning Engineering

  • ML engineering open-book: compute, storage, networking
  • Great for: Production ML systems
  • Format: Comprehensive guide

Realtime Phone Agents Course

  • Build low-latency voice agents
  • Great for: Voice AI applications
  • Format: Hands-on course

LLMs from Scratch

  • Build a working LLM from first principles
  • Great for: Understanding LLM internals
  • Format: Repository + book materials

💡 How to Use This Collection

For Complete Beginners

  1. Start with: Microsoft’s AI for Beginners
  2. Practice with: PyTorch Tutorials
  3. Explore: Awesome AI Apps for inspiration

For Developers

  1. Build skills: OpenAI Cookbook + Claude Cookbooks
  2. Find tools: Awesome Generative AI + Awesome LLM
  3. Learn workflows: n8n Workflows Catalog

For Researchers

  1. Stay updated: Awesome Generative AI + LLMSys PaperList
  2. Deep dive: Awesome LLM
  3. Implement: Hugging Face Cookbook

For Product Builders

  1. Find examples: Awesome AI Apps
  2. Learn workflows: n8n Workflows Catalog
  3. Study patterns: Awesome LLM Apps

🔄 What Was NOT Removed

Agent frameworks and production tools remain in the AI Resources list, including:

  • AutoGen - Microsoft’s multi-agent framework
  • CrewAI - High-performance multi-agent orchestration
  • LangGraph - Stateful multi-agent applications
  • Flowise - Visual agent platform
  • Langflow - Visual workflow builder
  • And 80+ more agent frameworks

These are functional tools you can use to build applications, not educational collections. They belong in the AI Resources list.


📝 Summary

I removed 44 collection-type projects from the AI Resources list to keep it focused on production tools:

  • 14 Awesome Lists - Discover new tools and stay updated
  • 9 Courses & Tutorials - Structured learning paths
  • 5 Cookbooks - Practical code examples
  • 5 Guides & Handbooks - In-depth resources
  • 4 Template Collections - Reusable workflows
  • 7 Other Resources - Research and evaluation

These resources remain incredibly valuable for learning and discovery. They just serve a different purpose than the production-focused tools in my AI Resources list.


Next Steps:

  1. Bookmark this post for future reference
  2. Explore the AI Resources list for production tools (agent frameworks, databases, etc.)
  3. Check out my blog for more AI engineering insights

Acknowledgments: This collection was compiled during my AI Resources cleanup initiative. Special thanks to all the maintainers of these awesome lists, courses, and collections for their invaluable contributions to the AI community.

Standing on Giants' Shoulders: The Traditional Infrastructure Powering Modern AI

作者 Jimmy Song
2026年2月8日 16:00

“If I have seen further, it is by standing on the shoulders of giants.” — Isaac Newton

Figure 1: Standing on Giants’ Shoulders: The Traditional Infrastructure Powering Modern AI
Figure 1: Standing on Giants’ Shoulders: The Traditional Infrastructure Powering Modern AI

In the excitement surrounding LLMs, vector databases, and AI agents, it’s easy to forget that modern AI didn’t emerge from a vacuum. Today’s AI revolution stands upon decades of infrastructure work—distributed systems, data pipelines, search engines, and orchestration platforms that were built long before “AI Native” became a buzzword.

This post is a tribute to those traditional open source projects that became the invisible foundation of AI infrastructure. They’re not “AI projects” per se, but without them, the AI revolution as we know it wouldn’t exist.

The Evolution: From Big Data to AI

Era Focus Core Technologies AI Connection
2000s Web Search & Indexing Lucene, Elasticsearch Semantic search foundations
2010s Big Data & Distributed Computing Hadoop, Spark, Kafka Data processing at scale
2010s Cloud Native Docker, Kubernetes Model deployment platforms
2010s Stream Processing Flink, Storm, Pulsar Real-time ML inference
2020s AI Native Transformers, Vector DBs Built on everything above
Table 1: Evolution of Data Infrastructure

Big Data Frameworks: The Data Engines

Before we could train models on petabytes of data, we needed ways to store, process, and move that data.

Apache Hadoop (2006)

GitHub: https://github.com/apache/hadoop

Hadoop democratized big data by making distributed computing accessible. Its HDFS filesystem and MapReduce paradigm proved that commodity hardware could process web-scale datasets.

Why it matters for AI:

  • Modern ML training datasets live in HDFS-compatible storage
  • Data lakes built on Hadoop became training data reservoirs
  • Proved that distributed computing could scale horizontally

Apache Kafka (2011)

GitHub: https://github.com/apache/kafka

Kafka redefined data streaming with its log-based architecture. It became the nervous system for real-time data flows in enterprises worldwide.

Why it matters for AI:

  • Real-time feature pipelines for ML models
  • Event-driven architectures for AI agent systems
  • Streaming inference pipelines
  • Model telemetry and monitoring backbones

Apache Spark (2014)

GitHub: https://github.com/apache/spark

Spark brought in-memory computing to big data, making iterative algorithms (like ML training) practical at scale.

Why it matters for AI:

  • MLlib made ML accessible to data engineers
  • Distributed data processing for model training
  • Spark ML became the de facto standard for big data ML
  • Proved that in-memory computing could accelerate ML workloads

Search Engines: The Retrieval Foundation

Before RAG (Retrieval-Augmented Generation) became a buzzword, search engines were solving retrieval at scale.

Elasticsearch (2010)

GitHub: https://github.com/elastic/elasticsearch

Elasticsearch made full-text search accessible and scalable. Its distributed architecture and RESTful API became the standard for search.

Why it matters for AI:

  • pioneered distributed inverted index structures
  • Proved that horizontal scaling was possible for search workloads
  • Many “AI search” systems actually use Elasticsearch under the hood
  • Query DSL influenced modern vector database query languages

OpenSearch (2021)

GitHub: https://github.com/opensearch-project/opensearch

When AWS forked Elasticsearch, it ensured search infrastructure remained truly open. OpenSearch continues the mission of accessible, scalable search.

Why it matters for AI:

  • Maintains open source innovation in search
  • Vector search capabilities added in 2023
  • Demonstrates community fork resilience

Databases: From SQL to Vectors

The evolution from relational databases to vector databases represents a paradigm shift—but both have AI relevance.

Traditional Databases That Paved the Way

  • Dgraph (2015) - Graph database proving that specialized data structures enable new use cases
  • TDengine (2019) - Time-series database for IoT ML workloads
  • OceanBase (2021) - Distributed database showing ACID transactions could scale

Why they matter for AI:

  • Proved that specialized database engines could outperform general-purpose ones
  • Database internals (indexing, sharding, replication) are now applied to vector databases
  • Multi-model databases (graph + vector + relational) are becoming the norm for AI apps

Cloud Native: The Runtime Foundation

When Docker and Kubernetes emerged, they weren’t built for AI—but AI couldn’t scale without them.

Docker (2013) & Kubernetes (2014)

GitHub: https://github.com/kubernetes/kubernetes

Kubernetes became the operating system for cloud-native applications. Its declarative API and controller pattern made it perfect for AI workloads.

Why it matters for AI:

  • Model deployment platforms (KServe, Seldon Core) run on K8s
  • GPU orchestration (NVIDIA GPU Operator, Volcano, HAMi) extends K8s
  • Kubeflow made K8s the standard for ML pipelines
  • Microservice patterns enable modular AI agent architectures

Service Mesh & Serverless

Istio (2016), Knative (2018) - Service mesh and serverless platforms that proved:

  • Network-level observability applies to AI model calls
  • Scale-to-zero is essential for cost-effective inference
  • Traffic splitting enables A/B testing of ML models

Why they matter for AI:

  • AI Gateway patterns evolved from API gateways + service mesh
  • Serverless inference platforms use Knative-style autoscaling
  • Observability patterns (tracing, metrics) are now standard for ML systems

API Gateways: From REST to LLM

API gateways weren’t designed for AI, but they became the foundation of AI Gateway patterns.

Kong, APISIX, KGateway

These API gateways solved rate limiting, auth, and routing at scale. When LLMs emerged, the same patterns applied:

AI Gateway Evolution:

Traditional API Gateway (2010s)
Rate Limiting → Token Bucket Rate Limiting
Auth → API Key + Organization Management
Routing → Model Routing (GPT-4 → Claude → Local Models)
Observability → LLM-specific Telemetry (token usage, cost)
AI Gateway (2024)

Why they matter for AI:

  • Proved that centralized API management scales
  • Plugin architectures enable LLM-specific features
  • Traffic management patterns apply to prompt routing
  • Security patterns (mTLS, JWT) now protect AI endpoints

Workflow Orchestration: The Pipeline Backbone

Data engineering needs pipelines. ML engineering needs pipelines. AI agents need workflows.

Apache Airflow (2015)

GitHub: https://github.com/apache/airflow

Airflow made pipeline orchestration accessible with its DAG-based approach. It became the standard for ETL and data engineering.

Why it matters for AI:

  • ML pipeline orchestration (feature engineering, training, evaluation)
  • Proved that DAG-based workflow definition works at scale
  • Prompt engineering pipelines use Airflow-style orchestration
  • Scheduler patterns are now applied to AI agent workflows

n8n, Prefect, Flyte

Modern workflow platforms that evolved from Airflow’s foundations:

  • n8n (2019) - Visual workflow automation with AI capabilities
  • Prefect (2018) - Python-native workflow orchestration for ML
  • Flyte (2019) - Kubernetes-native workflow orchestration for ML/data

Why they matter for AI:

  • Multi-modal agents need workflow orchestration
  • RAG pipelines are essentially ETL pipelines for embeddings
  • Prompt chaining is DAG-based orchestration

Data Formats: The Lakehouse Foundation

Before we could train on massive datasets, we needed formats that supported ACID transactions and schema evolution.

Delta Lake, Apache Iceberg, Apache Hudi

These table formats brought reliability to data lakes:

Why they matter for AI:

  • Training datasets need versioning and reproducibility
  • Feature stores use Delta/Iceberg as storage formats
  • Proved that “big data” could have transactional semantics
  • Schema evolution handles ML feature drift

The Invisible Thread: Why These Projects Matter

What do all these projects have in common?

  1. They solved scaling first - AI training/inference needs horizontal scaling
  2. They proved distributed systems work - Modern AI is fundamentally distributed
  3. They created ecosystem patterns - Plugin systems, extension points, APIs
  4. They established best practices - Observability, security, CI/CD
  5. They built developer habits - YAML configs, declarative APIs, CLI tools

The AI Native Continuum

Modern “AI Native” infrastructure didn’t replace these projects—it builds on them:

Traditional Project AI Native Evolution Example
Hadoop HDFS Distributed model storage HDFS for datasets, S3 for checkpoints
Kafka Real-time feature pipelines Kafka → Feature Store → Model Serving
Spark ML Distributed ML training MLlib → PyTorch Distributed
Elasticsearch Vector search ES → Weaviate/Qdrant/Milvus
Kubernetes ML orchestration K8s → Kubeflow/KServe
Istio AI Gateway service mesh Istio → LLM Gateway with mTLS
Airflow ML pipeline orchestration Airflow → Prefect/Flyte for ML
Table 2: From Traditional to AI Native

Why We’re Removing Them from AI Resources List

This post honors these projects, but we’re also removing them from our AI Resources list. Here’s why:

They’re not “AI Projects”—they’re foundational infrastructure.

  • Hadoop, Kafka, Spark are data engineering tools, not ML frameworks
  • Elasticsearch is search, not semantic search
  • Kubernetes is general-purpose orchestration
  • API gateways serve REST/GraphQL, not just LLMs

But their absence doesn’t diminish their importance.

By removing them, we acknowledge that:

  1. AI has its own ecosystem - Transformers, vector DBs, LLM ops
  2. Traditional infra has its own domain - Data engineering, cloud native
  3. The intersection is where innovation happens - AI-native data platforms, LLM ops on K8s

The Giants We Stand On

The next time you:

  • Deploy a model on Kubernetes
  • Stream features through Kafka
  • Search embeddings with a vector database
  • Orchestrate a RAG pipeline with Prefect

Remember: You’re standing on the shoulders of Hadoop, Kafka, Elasticsearch, Kubernetes, and countless others. They built the roads we now drive on.

The Future: Building New Giants

Just as Hadoop and Kafka enabled modern AI, today’s AI infrastructure will become tomorrow’s foundation:

  • Vector databases may become the new standard for all search
  • LLM observability may evolve into general distributed tracing
  • AI agent orchestration may reinvent workflow automation
  • GPU scheduling may influence general-purpose resource management

The cycle continues. The giants of today will be the foundations of tomorrow.

Conclusion: Gratitude and Continuity

As we clean up our AI Resources list to focus on AI-native projects, we don’t forget where we came from. Traditional big data and cloud native infrastructure made the AI revolution possible.

To the Hadoop committers, Kafka maintainers, Kubernetes contributors, and all who built the foundation: Thank you.

Your work enabled ChatGPT, enabled Transformers, enabled everything we now call “AI.”

Standing on your shoulders, we see further.


Acknowledgments: This post was inspired by the need to refactor our AI Resources list. The 27 projects mentioned here are being removed—not because they’re unimportant, but because they deserve their own category: The Foundation.

Helm v4: Paradigm Convergence and Plugin System Rebuild

作者 Jimmy Song
2025年11月14日 19:18

The release of Helm 4 is not just a technical upgrade, but a deep convergence of cloud-native delivery paradigms. The rebuilt plugin system and supply chain governance capabilities make Helm once again a driving force in the Kubernetes ecosystem.

Since its first release in 2016, Helm has been one of the most important application distribution tools in the Kubernetes ecosystem. Helm v4 is not a “minor enhancement,” but a comprehensive update around delivery methods, extension mechanisms, and supply chain approaches.

This article reconstructs Helm’s historical context and focuses on why Helm 4 represents a paradigm-converging release.

Helm: From Tiller to Declarative Delivery

Below is a textual timeline showing key milestones from Helm v2 to v4, helping you understand its technical evolution:

  • 2016: Helm v2 released, using the Tiller architecture.
  • 2017: Chart Hub expands, major projects begin providing official Charts.
  • 2018: Security model controversies intensify, Tiller’s permission issues become apparent.
  • 2019: Helm v3 released, Tiller removed, OCI support introduced.
  • 2021: GitOps becomes widespread, Server-Side Apply (SSA) becomes the mainstream delivery semantic.
  • 2023: kstatus widely adopted for controller status assessment and health calculation.
  • 2025: Helm v4 released, bringing SSA, WASM plugins, reproducible builds, and content hash caching.

Each major Helm release closely follows Kubernetes paradigms, driving progress in declarative delivery and ecosystem tooling.

Fundamental Changes in Helm v4

This section analyzes the core technical upgrades and paradigm shifts in Helm v4.

Delivery Paradigm Update: Default Server-Side Apply (SSA, Server-Side Apply)

In Helm v3 and earlier, Helm used a “three-way merge” model for resource delivery. Helm v4 fully switches to Server-Side Apply (SSA, Server-Side Apply), meaning the API Server determines field ownership.

This shift brings several direct results:

  • Full semantic alignment with kubectl apply and GitOps controllers (such as Argo, Flux)
  • When multiple controllers manage the same object, silent overrides are avoided and conflicts are explainable
  • Helm’s behavior now follows Kubernetes’ officially recommended declarative delivery paradigm

The following flowchart compares the delivery semantics of Helm v3 and v4.

Figure 1: Helm v3/v4 Delivery Semantics Comparison
Figure 1: Helm v3/v4 Delivery Semantics Comparison

Helm is now aligned with the delivery semantics of modern Kubernetes versions, improving predictability and safety in resource management.

kstatus-Driven Wait Behavior and Readiness Annotations

In Helm 3, --wait could only make fuzzy status judgments on limited resources, lacking extensibility and explainability.

Helm 4 introduces kstatus (Kubernetes Status) as the basis for health status parsing, and supports two key annotations:

  • helm.sh/readiness-success
  • helm.sh/readiness-failure

Chart authors can precisely define conditions for installation success or failure. Helm’s waiting model now offers “explainability + extensibility,” upgrading from a “templating tool” to a true “deployment orchestrator.”

Extension System Rebuild: WASM Plugin System

Helm 4 thoroughly reconstructs the plugin model, mainly including:

Typed and Structured Plugins

  • Arbitrary scripts are no longer allowed; plugins must follow typed and structured standards

WebAssembly Plugin Runtime (Extism)

  • More secure (sandbox isolation)
  • Cross-language support
  • Easy unified management in CI/CD and enterprise platforms
  • Predictable and testable

Post-renderer Integrated into Plugin System

  • Moves beyond the “external executable black box” era
  • Helm becomes a programmable platform, not just a template renderer

Engineering Capabilities Upgrade: Reproducible Builds, Content Hash Caching, chart API v3

Helm v4 brings the following engineering improvements:

  • Chart packaging is reproducible (supports signing, SBOM, SLSA, etc. for supply chain governance)
  • Local cache uses content hashes, avoiding version-based conflicts
  • chart API v3 (experimental) is stricter and more flexible
  • SDK logging system upgraded to Go slog (modern logging)

These capabilities enable Helm charts to enter serious software supply chain governance.

Feature Comparison (Helm v3 → v4)

The table below compares core features between Helm v3 and v4 for a quick understanding of the upgrade value.

Area Helm 3 Helm 4
Apply Model Three-way merge Default SSA
Wait Behavior Fuzzy, not extensible kstatus + annotation
Plugin System Script, uncontrollable WASM, typed plugins
Post-renderer External executable Plugin subsystem
Build Not reproducible Reproducible build
Cache name/version Content hash
Chart API v2 v2 + v3 (experimental)
SDK Logs stdlib log slog
Table 1: Helm v3 vs v4 Feature Comparison

This is a release that “repays technical debt in bulk + aligns with contemporary Kubernetes semantics.”

Why Is Helm v4 a Paradigm Convergence Event?

The release of Helm v4 is not just a feature upgrade, but a deep convergence of delivery paradigms, mainly in three aspects:

Kubernetes Delivery Semantics Unified to SSA

Previously: kubectl, GitOps, and Helm each had their own logic. Now: All unified to SSA, consistent delivery behavior, smoother ecosystem collaboration.

Plugin System Enters the Platform Era

WASM (WebAssembly) brings a secure, universal, and controllable plugin runtime. Infrastructure projects widely adopt WASM: Envoy → WASM Filters, Kubernetes → WASM CRI/OCI, and now Helm joins the platform camp.

Charts Enter Supply Chain Governance

Reproducible builds and digest verification allow Helm charts to be managed as seriously as container images, greatly enhancing supply chain security.

The entire ecosystem moves to a unified capability baseline, driving cloud-native delivery standardization.

My Helm History and Observations

As an early user from the Helm v2 era, I have experienced the following stages:

  • Tiller security controversies
  • v3 migration (state stored in secrets)
  • Large-scale chart consolidation in the community
  • OCI adoption
  • Today’s SSA / WASM / reproducible build

Each major Helm version upgrade is not about chasing trends, but proactively aligning with Kubernetes paradigms:

  • v3 aligns with K8s “no cluster-side runtime” principle
  • v4 aligns with SSA, kstatus, WASM, OCI, and other advances from the past five years

Helm exemplifies the evolution rhythm of infrastructure projects: not by piling on features, but by evolving in semantic alignment with the platform.

Summary

The release of Helm v4 marks a new paradigm for Kubernetes application delivery. SSA, WASM plugins, kstatus, and reproducible builds make Helm not just a templating tool, but a core for supply chain governance and platform extensibility. For cloud-native developers and platform teams, Helm v4 is a paradigm upgrade worth attention.

References

Kimi K2 Thinking: The True Awakening of China's Thinking Model

作者 Jimmy Song
2025年11月14日 16:25

China’s large language models have finally moved from “writing like humans” to “thinking like humans.” The open-sourcing of Kimi K2 is a watershed moment for China’s AI trajectory.

The narrative around China’s large language models is shifting from “Chat-style models” to “Thinking models (Thinking Model, Thinking Model).”

Moonshot AI’s open-sourcing of Kimi K2 Thinking marks the first real landing of this transition. K2 is not just another iteration like ChatGLM or Qwen; it’s the first time a Chinese team has unified “deep reasoning + long context + tool invocation continuity” in training. This is the core of the thinking model approach and the reason why models like Claude and Gemini have led the field.

The Significance of K2’s Open Source: China Enters the Era of Thinking Models

Why is K2’s open source a turning point? Because it enables Chinese models to achieve the following capabilities for the first time:

  • Stable execution of 200–300 tool invocations (toolchain reasoning stability)
  • Deep, multi-stage reasoning chain execution (CoT Consistency, Chain-of-Thought Consistency)
  • 256k context as a “working memory” (Working Memory, Working Memory)
  • Native INT4 acceleration + MoE activation sparsity scheduling

This is a completely different path from “stacking parameters → stacking benchmarks,” emphasizing reasoning ability over parameter scale.

In short:

K2 is the first time a Chinese model has entered the sequence of thinking models (Thinking Model, Thinking Model).

Dissecting K2’s Technical Approach

K2’s technical approach can be broken down into five key points, each directly impacting the model’s reasoning ability and ecosystem adaptability.

MoE Expert Division: Cognitive Division Rather Than Parameter Expansion

K2’s MoE (Mixture of Experts, Mixture of Experts) design philosophy is distinct from previous models. The core is not about activating fewer parameters or running larger models more cheaply, but about assigning different cognitive sub-skills to different experts. For example:

  • Mathematical reasoning expert
  • Planning expert
  • Tool invocation expert
  • Browser task expert
  • Code generation expert
  • Long-chain retention expert

This division aligns directly with Claude 3.5’s cognitive layering (Cognitive Layering, Cognitive Layering) approach. K2’s MoE is about “dividing thinking among the model,” not just “making computation cheaper.”

256K Context: Building the Model’s Working Memory

K2’s ultra-long context is not just a parameter showcase; it’s designed to build the model’s “thinking buffer.” It allows the entire process to retain reasoning chains, tool invocation states, multi-stage reflection, and uninterrupted long tasks (such as research or code refactoring), stably executing multi-stage agent workflows. Long-term thinking requires long-term memory support, and K2’s long context is the “memory” for sustained reasoning chains.

Intertwined Training of Tool Invocation and Reasoning Chains

K2 excels in the intertwined training of tool invocation and reasoning chains. Traditional open-source models typically follow this process:

  1. Generate reasoning
  2. Output JSON function call
  3. Tool returns result
  4. Continue reasoning

In this approach, the reasoning chain and invocation chain are separated. K2’s training allows the reasoning chain to invoke tools at any time and feed tool results back into the reasoning chain for the next stage of thinking. It supports 200–300 consecutive tool invocations without interruption, fully aligning with Claude 3.5’s Interleaved CoT + Tool Use.

Native INT4 Quantization: Ensuring Reasoning Chain Stability

K2’s INT4 (INT4, 4-bit Integer Quantization) approach is not ordinary post-quantization. Its purpose is not only to reduce memory usage and increase throughput, but more importantly, to ensure that deep reasoning chains do not break due to insufficient computing power. The biggest killer of deep thinking chains is timeout, freezing, or unstable workers. INT4 enables Chinese GPUs (non-H100) to run complete reasoning chains, which is highly significant for China’s ecosystem.

MoE + Long Context + Toolchain: Unified Training Rather Than Module Stitching

K2’s most important feature is its holistic training approach: expert division, long context-driven consistency, tool invocation trained through real execution, browser tasks and long-step task reinforcement, and INT4 entering the training loop. It’s not a “ChatLLM + Memory + RAG + Tools” patchwork, but an integrated reasoning system.

Alignment and Differences Between K2 and International Mainstream Approaches

K2 is highly aligned with international mainstream models (such as Claude, Gemini, OpenAI) in cognitive reasoning, ultra-long context, and tool invocation mechanisms, but also has unique advantages for Chinese models:

  • Native INT4 + adaptation to Chinese computing power is rare globally
  • Toolchain continuity is more stable than most open-source models
  • Higher degree of open source, stronger ecosystem reusability

Collaborative Value of China’s AI Infra: K2 × RLinf × Mem-alpha

A series of important open-source infrastructures have emerged in the K2 ecosystem. The table below summarizes these project types and their value to K2:

Here is a comparison table of the collaborative value of each infrastructure with K2:

Project Type Value to K2
RLinf Reinforcement Learning Used to train stronger planning/browser task capabilities
Mem-alpha Memory Enhancement Can be combined with K2 to form long-term memory agents
AgentDebug Agent Error Debugging Used to analyze K2’s toolchain errors
UI-Genie GUI Agent Training Can serve as an experimental field for K2’s agent capability expansion
Table 1: Collaborative Value of China’s AI Infra Ecosystem

This combination is already forming a China AI Agent Infra Stack.

Personal View: The Significance of K2’s Approach

I believe the significance of K2 lies not in the model itself, but in its technical approach:

K2 marks the first time Chinese models have shifted from “language generation competition” to “thinking ability competition.”

For the past three years, the main line of China’s open-source models has been evaluation scores, parameter scale, instruction following, and alignment data. But K2 is the first to clearly take the path of deep reasoning, tool intertwining, cognitive division, long-term task chains, and native performance optimization. This means China’s model trajectory is now synchronized with the US, rather than chasing old paths.

Key Directions to Watch in K2’s Ecosystem Over the Next Year

K2’s future ecosystem influence will depend on several key points:

  • Whether it opens the tool registry (Tool Registry, Tool Registry)
  • Whether it supports dynamic memory (Mem-alpha integration)
  • Whether it opens the MoE expert structure
  • Whether it can form a Chinese reasoning chain optimization path with vLLM / llm-d / KServe
  • Whether it supports fault tolerance for multi-node continuous reasoning chains

These capabilities will determine K2’s ecosystem influence and technical extensibility.

K2 Thinking Model Architecture Diagram

The following flowchart illustrates the core architecture of the K2 thinking model and its collaboration with external agents/applications:

Figure 1: K2 Thinking Model Architecture
Figure 1: K2 Thinking Model Architecture

Summary

K2 is the first time China’s model trajectory is heading in the right direction:

From “writing like humans” to “thinking like humans.”

The era of thinking models is coming, and Chinese models are finally standing on the same roadmap as the international forefront.

References

TRAE SOLO vs VS Code: Rethinking Coding Tools from the Perspective of AI Engineering Entities

作者 Jimmy Song
2025年11月14日 15:14

Coding tools are evolving from “AI assistants” into true engineering entities. How can we reinterpret the roles of TRAE SOLO and VS Code from a pipeline perspective?

Recently, TRAE International Edition SOLO mode has been fully opened to overseas users. It claims to be a “responsive coding agent” and is now available for official trial, with token-based rate limiting.

I’ve used early versions of TRAE (without SOLO access), and also tried Qoder and Kiro. The AI coding field is flourishing, each tool with its own strengths.

Now, with GitHub’s Agent HQ concept from Universe, and the “AI Engineering Entity (AIEE)” framework I wrote about in AI-Native Application Architecture, it’s time to re-examine today’s coding tool landscape.

This article compares TRAE SOLO and VS Code (with Copilot, Plan/Agent mode, and Agent HQ) from the perspective of AI engineering entities, combining personal experience to outline their differences in engineering automation, collaboration, and governance.

Three Engineering Role Abstractions: End-to-End Executor, Contextual Collaborator, and Expert Orchestrator

From an engineering perspective, current mainstream AI coding tools can be abstracted into three roles:

End-to-End Executor:

  • Focuses on “requirement to deployment” workflows, capable of autonomous planning, task breakdown, coding, testing, previewing, and even deployment. Officially called “AI-Powered Context Engineer.”
  • User experience: like a “full-chain executor”—give it a requirement, and it handles the project, even if it’s slow or imperfect.

Contextual Collaborator:

  • VS Code is a powerful editor. Copilot has evolved from line-level completion to Chat, Plan agent, Agent mode, supporting multi-step tasks and codebase analysis.
  • It doesn’t take over the whole project, but efficiently handles local tasks under your guidance, acting as an automated unit for specific segments.

Expert Orchestrator / Specialist Engine:

  • GitHub’s Agent HQ is a “central platform for AI coding agents,” a unified control plane that can connect to OpenAI, Anthropic, Google, xAI, etc., run agents in parallel, and compare results.
  • Functions as an “expert orchestrator” for key steps—planning, review, refactoring, or decision-making—providing high-quality output without taking over the entire project.

These three roles correspond to the structure in “AI Engineering Entity (AIEE)”:

  • Single end-to-end executor (TRAE SOLO)
  • Contextual collaborator residing in the IDE (VS Code + Copilot)
  • Specialist orchestrator platform for multi-entity scheduling (Agent HQ)

Quick Product Status Check

To avoid memory bias, let’s clarify some key facts.

TRAE / TRAE SOLO

  • TRAE claims to be a “10x AI Engineer,” able to independently understand requirements and execute development tasks.
  • SOLO mode is GA for international users, emphasizing full-chain automation, available directly but with token limits.
  • Underlying open-source Trae Agent CLI can execute multi-step engineering tasks in real codebases.
  • TraeIDE’s official page shows built-in Claude 3.5/3.7, DeepSeek, etc., but the community notes slow integration of new models like Claude Sonnet 4.5.

So, “TRAE does not support Claude” is now inaccurate—at least officially, Claude models are included. Which model is used in SOLO mode and whether it’s exposed to users remains unclear; the experience still needs improvement.

VS Code + Copilot + Agent HQ

  • Copilot in VS Code now features Chat, Plan agent, Todo/multi-step execution:
    • Plan mode analyzes codebases, generates execution plans, splits into Todos, then implementation agents execute step by step.
    • Agent mode provides a more automated “multi-step companion programmer” experience.
  • GitHub launched Agent HQ at Universe 2025, integrating Copilot and third-party agents (Anthropic, OpenAI, Google, xAI, Cognition, etc.) into a unified control plane, supporting parallel runs and result comparison.

In short:

  • TRAE is like “embedding an engineering entity into the IDE.”
  • VS Code + Copilot is “adding a set of engineering entities to a mature IDE.”
  • Agent HQ is positioned as “headquarters for multiple engineering entities.”

Reconstructing Comparison Dimensions with the AI Engineering Entity Framework

In “AI Engineering Entity (AIEE),” the definition is:

AI has evolved from editor auto-completion to a formal node in the software supply chain, able to receive tasks, produce reviewable artifacts (PR/diff/report), pass tests/gates, and be replaced if it fails. It’s no longer just an “enhanced human developer,” but a “functional engineering unit” in the pipeline.

Based on this, we can reconstruct key comparison dimensions for TRAE and VS Code:

  1. Existence as an Independent Functional Unit
    Can it autonomously plan, implement, and produce PRs/reports from natural language requirements, without continuous human intervention?

  2. Context Modeling Capability
    Can it model across files, directories, terminal output, and browser content to form a stable engineering context?

  3. Position in the Pipeline
    Is it an enhancement layer within the IDE, or a formal node in CI/CD and code review flows?

  4. Reviewability and Replaceability
    Are its outputs standardized (PR, diff, report) and suitable for regular pipeline review and rollback?

  5. Multi-Agent Collaboration Capability
    Does it natively support multi-agent collaboration, or is it focused on single-agent enhancement?

TRAE SOLO vs VS Code: Engineering Entity Comparison Table

The following table summarizes the main differences from the engineering entity perspective. Note: VS Code includes Copilot Chat + Plan/Agent mode by default and can mount the Agent HQ ecosystem.

You can interpret this table as: “If AI is treated as an engineering entity in the pipeline, what roles do TRAE and VS Code play?”

Table:

Dimension TRAE SOLO VS Code + Copilot / Agent
Engineering Role Single strong entity, directly handles end-to-end tasks from idea to deployment IDE + multiple entities (Plan, Implementation, Review), IDE itself is the engineering base
Task Granularity Project/feature level: from PRD-style description to full project scaffold, implementation, testing, preview Function/file level mainly; Plan mode can scale to feature/subsystem level
Context Modeling Emphasizes “context engineering”: reads codebase, terminal output, browser content as unified input for SOLO Mainly codebase; Plan Agent generates plans based on code analysis, Agent mode schedules by plan
Automation Can proactively modify files, run commands, tests, start local services, forming a complete loop Plan/Agent can run commands, modify files, run tests, but is more dependent on your current project/workflow
Human Intervention More “post-review”: let it run first, then review and fine-tune More “in-process collaboration”: frequent intervention in planning, implementation, and review, with control points at each step
Output Form Code changes, test results, preview; sometimes PRs/docs Code completion, refactoring, PR comments, CodeQL reports, Plan/Todo lists
Multi-Agent Core is the SOLO agent; other capabilities (like Trae Agent CLI) are extensions Copilot itself is an agent; Agent HQ allows parallel competition among multiple agents
Model Transparency Product exposes specific models poorly; users can’t tell which model is used GitHub clearly marks Copilot’s model family; Agent HQ shows agent sources directly
Performance Strong automation but slow; complex projects may stall at “thinking” stage; hard token limits Stable response in familiar projects; mostly local changes, overall latency is controllable
Privacy & Compliance Official and third-party reviews mention extensive telemetry/data collection; enterprise adoption needs extra evaluation Copilot for Enterprise has clear data isolation/compliance, suitable for most enterprise governance needs
Table 1: TRAE SOLO vs VS Code Engineering Entity Comparison

From the table:

  • If you want “an AI engineering entity that takes full responsibility from requirement to deployment,” TRAE SOLO fits that role.
  • If you want “a stable engineering base + a set of pluggable entities,” VS Code + Copilot + Agent HQ fits better.

Workflow Comparison: Two Engineering Entity Pipelines

To clarify their engineering flows, the following diagram illustrates typical workflows for TRAE SOLO and VS Code.

Before the diagram, here’s an introductory sentence:
The following Mermaid diagram visually compares the engineering pipelines of TRAE SOLO and VS Code, highlighting their respective collaboration models.

Figure 1: TRAE SOLO vs VS Code Engineering Entity Pipeline Comparison
Figure 1: TRAE SOLO vs VS Code Engineering Entity Pipeline Comparison

This diagram shows two typical collaboration models:

  • TRAE SOLO attempts to encapsulate “context aggregation → planning → implementation → testing → preview/deployment” within a single engineering entity, with user intervention only at requirement input and output review.
  • VS Code + Copilot + Agent HQ uses the IDE as runtime, with Plan/Implementation/Review agents corresponding to different roles. Agent HQ supports parallel agent competition, allowing developers to select the best solution.

Model Transparency, Speed, and Predictability

Based on personal experience, here are the model transparency and speed issues from the engineering entity perspective:

Model Transparency

  • TRAE currently exposes “which model is called” poorly; switching MAX mode only suggests “stronger model or higher quota,” but no clear feedback.
  • Community feedback notes slow integration of new models; some strong models (like Claude series) are available elsewhere but not yet in TRAE.

This means:

  • TRAE is hard to use as a “precisely configurable engineering unit,” more like a black box, making model change management in CI/CD or production pipelines difficult.
  • VS Code + Copilot + Agent HQ is stronger in standardization; GitHub clearly marks Copilot’s model family, Agent HQ uses agent source as the abstraction boundary.

Speed and Predictability

  • TRAE SOLO’s “slowness” comes from executing more steps (reading files, analyzing, planning, testing) and insufficient engineering process visualization. The UI shows “Thinking…” prompts, making it hard to tell if it’s stuck or planning.
  • VS Code’s Plan mode explicitly lists plans and Todos; Agent mode emphasizes “execution by plan,” letting users clearly see the entity’s work status, improving predictability.

Agent HQ Positioning: Single Entity vs Multi-Entity Headquarters

From a platform perspective, GitHub and TRAE differ as follows:

  • Agent HQ’s core idea: future development will rely on multiple specialized agents collaborating in parallel. GitHub is building “agent headquarters,” not a single engineering agent. Developers can schedule agents in a unified control plane, integrating with existing GitHub Flow (Issue, PR, Review, CI/CD).
  • TRAE is more like “proprietary IDE + agent + full-stack context engineering,” delivering an integrated experience.

In terms of engineering entity organization:

  • GitHub is building “infrastructure and governance for multi-entity engineering systems.”
  • TRAE is building “vertically integrated engineering entity + private runtime.”

They’re not mutually exclusive, representing “broad platform + multi-entity scheduling” vs “single strong entity + proprietary toolchain.”

Subjective Experience and Engineering Framework Integration

Translating personal experience into engineering language:

  • VS Code is more accustomed to a “single IDE + multiple views” experience; TRAE splits IDE and SOLO modes, requiring mental switching.
  • TRAE’s engineering entity capabilities surpass ordinary completion tools, able to take on tasks, but model transparency and context quality are unstable, and governance needs improvement.
  • VS Code doesn’t take over the whole project, but local work is stable; Plan, Agent, and Review combinations enable multi-entity collaboration.

According to the “AI Engineering Entity (AIEE)” framework:

  • TRAE SOLO is a single AI engineering entity (AIEE) capable of handling complete engineering tasks, but still has clear shortcomings in model transparency, engineering governance, and enterprise-level controllability.
  • VS Code + Copilot + Agent HQ is an infrastructure platform for multiple engineering entities, less aggressive in end-to-end outsourcing in the short term, but clearer in engineering consistency, model replaceability, and organizational governance.

Summary

This article systematically compares TRAE SOLO and VS Code (with Copilot, Agent HQ) from the perspective of AI engineering entities, focusing on automation, collaboration, and model transparency. TRAE SOLO is better suited for individual developers or small teams seeking end-to-end automation, while VS Code + Copilot + Agent HQ provides stronger infrastructure for multi-entity collaboration, enterprise governance, and engineering consistency. In the future, AI engineering entities will become formal nodes in the software development pipeline, and tool selection should be based on engineering needs and governance requirements.

References

❌
❌