普通视图

发现新文章，点击刷新页面。

昨天以前云原生

云原生
Kubernetes's Anxiety and Rebirth in the AI WaveJimmy Song
Kubernetes hasn’t been replaced by AI, but it’s being redefined by it. Anxiety is the prelude to rebirth. After attending KubeCon EU 2026 in Amsterdam, I’ve been pondering a key question: Kubernetes isn’t obsolete, but it’s no longer “enough”; it hasn’t been replaced by AI, but it’s being redefined by AI. Figure 1: KubeCon EU 2026 slogan: Keep Cloud Native Moving. This event had over 13,000 registrations, making it the largest KubeCon to date. This was my third time attending KubeCon in Eur
2026年4月3日 13:20

Kubernetes's Anxiety and Rebirth in the AI Wave

云原生

作者 Jimmy Song

2026年4月3日 13:20

Kubernetes hasn’t been replaced by AI, but it’s being redefined by it. Anxiety is the prelude to rebirth.

After attending KubeCon EU 2026 in Amsterdam, I’ve been pondering a key question: Kubernetes isn’t obsolete, but it’s no longer “enough”; it hasn’t been replaced by AI, but it’s being redefined by AI.

Figure 1: KubeCon EU 2026 slogan: Keep Cloud Native Moving. This event had over 13,000 registrations, making it the largest KubeCon to date.

This was my third time attending KubeCon in Europe. Over the past few years, you can actually see the community’s mindset shift through the event slogans:

2024 Paris: La vie en Cloud Native

→ Cloud Native has become a “way of life,” the default state
2025 London: No slogan, just the 10th anniversary

→ Kubernetes reached a milestone, focusing on retrospection rather than moving forward
2026 Amsterdam: Keep Cloud Native Moving

→ But the question is: where is it moving?

The absence of a slogan in 2025 was a signal in itself:

When an ecosystem starts commemorating the past instead of defining the future, it’s already at an inflection point.

This article doesn’t recap the talks, but instead distills my observations at KubeCon into insights about Kubernetes’ anxiety and rebirth in the AI wave.

The Root of Anxiety: Is Kubernetes Facing a “Crisis”?

The biggest change at KubeCon was that AI has completely replaced traditional cloud native topics. The focus shifted from service optimization and microservices management to how to deploy and manage AI workloads on Kubernetes, especially inference tasks and GPU scheduling.

Figure 2: Before KubeCon officially started, the Maintainer Summit was all about AI.

Kubernetes, as the foundational infrastructure, was once the core of the cloud native world. With the explosive growth of AI models, the question now is whether Kubernetes can still serve as a “universal” platform for everything, which has become a new source of anxiety.

The AI boom brings real challenges: Can Kubernetes’ “universality” adapt to the complexity of AI workloads?

The Focus Brought by the AI Boom

AI’s popularity has shifted the cloud native spotlight entirely to artificial intelligence. AI coding, OpenClaw, large language models, and generative models have all drawn widespread attention. AI has become the core computing demand in the real world.

This surge in demand raises the question: Can Kubernetes continue to serve as the infrastructure platform for complex tasks? Especially with issues like GPU sharing, inference model scheduling, VRAM allocation, and device attribute selection, is the traditional Kubernetes resource model sufficient?

In the past, Kubernetes handled compute, storage, and networking as foundational infrastructure. But with the rapid development of AI, its “universality” is being challenged. Particularly for inference tasks, Kubernetes’ model appears thin.

Comparing with OpenStack: Will Kubernetes Repeat History?

OpenStack once aimed to be a complete open-source cloud platform, but ultimately failed to sustain growth due to complexity and a lack of flexibility in adapting to new technologies.

Will Kubernetes follow the same path? I believe Kubernetes has different strengths: as a container and microservices orchestration platform, it’s widely adopted and has strong community and vendor support. It doesn’t try to replace all cloud provider capabilities but serves as an infrastructure control plane to help users manage resources.

Figure 3: Cloud native contributors remain active. The crowd at the KubeCon EU 2026 Maintainer Summit shows the community’s vitality.

However, as AI workloads become mainstream, Kubernetes must find a new position to avoid being replaced by “AI-optimized platforms.”

Kubernetes’ Challenge: The GPU Resource Management Gap

At KubeCon, NVIDIA announced the donation of the GPU DRA (Dynamic Resource Allocation) driver to the CNCF, marking the upstreaming of GPU resource management. GPU sharing and scheduling have become urgent issues for Kubernetes.

Traditionally, Kubernetes relied on the Device Plugin model to schedule GPUs, only supporting allocation by device count (e.g., nvidia.com/gpu: 1). But for AI inference tasks, more information is needed for resource scheduling, such as VRAM size, GPU topology, and sharing strategies. NVIDIA DRA makes GPU resource management more flexible and intelligent, gradually easing the “GPU resource crunch” in AI workloads.

This shift means Kubernetes is no longer just a “container orchestration platform,” but is becoming the infrastructure layer for AI-specific resource scheduling.

Against this backdrop, both the community and industry are exploring finer-grained GPU resource abstraction and scheduling mechanisms. For example, the open-source project HAMi is building a GPU resource management layer for AI workloads on top of Kubernetes, supporting GPU sharing, VRAM-level allocation, and heterogeneous device scheduling.

Figure 4: HAMi demo at KubeCon EU 2026 Keynote

These efforts are not about replacing Kubernetes, but about filling the resource model gaps for the AI era. In the long run, this layer may evolve into a “GPU Abstraction Layer” similar to CNI/CSI, becoming a key part of AI-native infrastructure.

The Production “Gap”: Many AI PoCs, Few in Production

A common post-event summary was: Many PoCs, but “everyday production deployments” are still rare. Pulumi summarized it as:

lots of working demos, very few production setups people trust

This shows that while many AI workload solutions succeed in technical demos, the transition from experimentation to production remains difficult. Whether it’s GPU resource sharing or inference request scheduling, whether Kubernetes as the foundation can support this transformation is still an open question.

The Rise of Inference Systems: Kubernetes’ Scheduling Boundaries Are Challenged

Another major event at this KubeCon was llm-d being contributed to the CNCF as a Sandbox project.

If GPU DRA represents the upstreaming of device resource models, then llm-d represents another critical evolution: Distributed LLM inference capabilities are moving from proprietary engineering implementations to standardized, community-driven collaboration in cloud native.

This is significant not just because it’s another open-source project, but because it shows that Kubernetes’ challenges in the AI era are no longer just about “how to schedule GPUs,” but also “how to host inference systems themselves.” As prefill/decode separation, request routing, KV cache management, and throughput optimization move into the infrastructure layer, Kubernetes’ boundaries are being redefined.

Traditionally, the Kubernetes scheduler focused on Pod scheduling. But in AI inference scenarios, scheduling is not just about picking a node—it’s about selecting the most suitable inference instance based on request characteristics. Factors like model state, request queue depth, and cache hit rate all need to be considered. This process is increasingly managed by inference runtimes, forming new “request-level scheduling” systems.

This leads to an overlap between the Kubernetes scheduler and inference systems, forcing Kubernetes to rethink its role: should it keep expanding, or collaborate with inference systems?

AI-Native Infrastructure: The Key Challenge for Production

At the AI Native Summit, the real needs for AI-native infrastructure were especially clear. The focus was no longer “can it run on Kubernetes,” but how to make AI workloads routine, stable, and production-ready on Kubernetes.

Figure 5: At the AI Native Summit after KubeCon, Linux Foundation Chairman Jonathan said cloud native is entering the AI-native era.

The core challenge is delivery. Unlike traditional apps, AI model weights are often huge—tens of GB or even TB—making model delivery and data management extremely complex. Traditional container delivery systems (like image layers) struggle with such massive data and complex versioning.

A key direction for Kubernetes is to standardize model weight and data delivery, using ImageVolume and OCI artifacts to solve AI model delivery and version management on Kubernetes. This not only reduces “cold start” times but also provides infrastructure support for multi-tenancy and compliance.

Summary

Kubernetes won’t be replaced by AI, but it’s being reshaped as the core of infrastructure. This anxiety is the force driving its evolution—it’s moving from a “general-purpose infrastructure platform” to an “AI-powered multifunctional base”. Some even call it the AI operating system.

In the future, Kubernetes’ core competitiveness will no longer be just container management, but how effectively it can schedule and manage AI workloads, and how it can make AI a routine part of operations. This was my biggest takeaway from the AI Native Summit and KubeCon, and it’s what I look forward to in the Kubernetes ecosystem over the next few years.

References

Day One in Amsterdam: Kubernetes Is Rethinking AI

云原生

作者 Jimmy Song

2026年3月23日 04:41

Today marks my first day at KubeCon Europe 2026. The most striking feeling is: the world is vast, but this community is truly small.

Figure 11: Jimmy on the first day of KubeCon EU 2026

One strong impression stands out:

The world is big, but this circle is really small.

Old Friends, New Cycle

At the Maintainer Summit, I met many familiar faces—

Colleagues from Ant Group, friends from Tetrate, and some people I’ve known for nearly a decade. Together, we’ve journeyed from the early days of Kubernetes, Service Mesh, and cloud native infrastructure to today.

In a sense, this generation has fully experienced:

The rise of Kubernetes
The standardization of Cloud Native
The microservices and service mesh boom
And now, the era of AI Infrastructure

This isn’t about “new people entering the field,” but rather—

The same group stepping into a new technology cycle.

What Is the Maintainer Summit Discussing?

If you ask:

What is the Kubernetes community most concerned about right now?

Today’s answer is very clear:

👉 How to run AI workloads better on Kubernetes

Figure 12: The Maintainer Summit’s main topic is AI Infra

Many topics at the Maintainer Summit revolved around:

Scheduling models for LLM / AI workloads
GPU / accelerator resource management
Integrating inference systems with Kubernetes
Redefining the roles of data plane vs. control plane
How observability tools like OTel monitor AI workloads

In other words:

Kubernetes hasn’t been replaced by AI; it’s actively “absorbing” AI.

Key Signal: GPUs Are Becoming the “Infrastructure Layer”

Today, I had an in-depth discussion with CNCF TOC, Red Hat, and the vLLM community.

The core question was:

How should GPUs be “platformized”?

Some consensus is already clear:

GPUs are no longer just devices
They are now a schedulable, partitionable, and shareable resource layer

Figure 13: TOC meeting discussing GPU resource management and LLM Serving integration

At the Maintainer Summit in Amsterdam, we had deep discussions with CNCF TOC, Red Hat, and the vLLM community about GPU resource management and LLM Serving integration in Kubernetes scenarios, and explored potential collaboration between vLLM and HAMi.

Behind this is a major paradigm shift:

Past	Now
GPU = Node resource	GPU = Infrastructure layer
Exclusive use	Multi-tenant sharing
Static binding	Dynamic scheduling
Managed within frameworks	Unified management at the platform layer

This is exactly what we’ve been working on in HAMi.

HAMi: From “Project” to “Reference Pattern”

Another interesting change today:

HAMi is no longer just a “community project”—it’s becoming:

A reference implementation (reference pattern) for AI Infra

Figure 14: Li Mengxuan, CTO of Dynamia, sharing HAMi’s design and practice at KubeCon EU 2026 Maintainer Summit

This is reflected in several ways:

Invited to present at the Maintainer Summit
Participating in CNCF TOC discussions
Involved in incubating review demos
Exploring joint content with the vLLM community (even discussing a joint blog 👀)

Especially in conversations with Red Hat and vLLM, a clear trend emerged:

GPU resource management and LLM serving are becoming coupled

That is:

Upper layer: vLLM / inference frameworks
Lower layer: GPU scheduling / sharing

A new “interface layer” is gradually forming.

This is a direction worth betting on.

Figure 15: At the TAG Workshop, HAMi was discussed as an Incubating demo

A Caution: The AI Infra Startup Boom Hasn’t Really Begun

At the same time, I have a somewhat “counterintuitive” observation:

We haven’t yet seen a large wave of AI Infra (K8s-focused) startups.

Most companies I saw today:

Many are pivoting from CI/CD, Service Mesh, or Gateway
Many are traditional cloud vendors extending into AI
Many are working on models, agents, or even lower-level tech

But those truly focused on:

“Making AI workloads run better on Kubernetes”

There are actually not many startups at this layer.

This could mean two things:

1) This Layer Isn’t Fully Formed Yet

Currently, most activity is at:

The model layer (LLM / foundation models)
The application layer (Agent / Copilot)

But not at:

The scheduling layer
The resource layer
The runtime layer

2) Or, the Barrier to Entry Is Very High

Because at its core, this is:

The intersection of Cloud Native × GPU × AI workload

It’s not just “wrapping AI,” but a fundamental re-architecture at the infrastructure level.

My Take

If we break down the AI technology stack:

Agent / Application
 ↓
LLM Serving (vLLM, etc.)
 ↓
AI Runtime / Scheduling
 ↓
GPU Resource Layer
 ↓
Hardware

Most innovation today is concentrated in:

The top two layers (Agent / LLM)

But the real long-term moat lies in:

The middle two layers (Runtime + Resource Layer)

And Kubernetes is very likely to remain:

The default platform for this middle layer

Summary

Today’s takeaway:

Kubernetes is not obsolete; it’s being redefined.

And our generation is shifting from:

“Cloud Native Builders”

to:

“AI Infrastructure Builders”

More to come tomorrow.

HAMi Website Refactor: Why HAMi Docs and Website Underwent a Complete Redesign

云原生

作者 Jimmy Song

2026年3月17日 08:55

This redesign is more than a style update—it’s a step toward clearer technical communication and better user experience. Try the new HAMi website at https://project-hami.io and submit issues here.

Over the past two months, I conducted a thorough refactor of the documentation website (see GitHub). Externally, it looks like a “visual redesign”, but from the perspective of community maintainers and content builders, it’s a comprehensive upgrade of information architecture, content system, and frontend experience.

This article aims to systematically explain three things: why we did this refactor, what exactly changed, and what these changes mean for the HAMi community.

Why Refactor the Website and Documentation

HAMi is a CNCF-hosted open source project initiated and contributed by Dynamia, with growing influence in GPU virtualization, heterogeneous compute scheduling, and AI infrastructure. The community content is expanding, and user types are becoming more diverse: from first-time visitors to engineers and enterprise users seeking deployment docs, architecture diagrams, case studies, and ecosystem information.

The original site was functional, but as content grew, several issues became apparent:

The homepage lacked information density, making it hard to quickly grasp the project’s overall value.
Connections between docs, blogs, and community info were not smooth; content entry points were scattered.
Search experience was unstable; external solutions were not ideal in practice.
Mobile experience had many details needing improvement, especially navigation, card layouts, and footer areas.
Visual style was inconsistent, making it hard to convey community influence and engineering maturity.

For a fast-evolving open source community, the website is not just a “place for docs”, but the public interface of the community. It needs to serve as project introduction, knowledge gateway, adoption proof, community connector, and brand expression.

So the goal of this refactor was clear: not just superficial beautification, but to truly upgrade the website into HAMi’s systematic community entry point.

What Was Done in This Refactor

This update was not a single-point change, but a series of systematic improvements.

Homepage Redesign and Complete Information Architecture Overhaul

The most obvious change is the homepage.

We redesigned the homepage structure, moving away from simply stacking content blocks, and instead organizing the page around the main narrative: “Project Positioning → Core Capabilities → Ecosystem Entry → Content Accumulation → Community Trust”.

Specifically, the homepage received several key upgrades:

Rebuilt the Hero section to strengthen first-screen information delivery and action entry.
Optimized CTA design so users can quickly access docs, blogs, and resources.
Added and enhanced multiple homepage sections to showcase project value and community reach in a more structured way.
Adjusted visual hierarchy, background atmosphere, and scroll rhythm, transforming the homepage from a “content list” into a “narrative page”.

These changes include Hero animations and atmosphere layers, research/story sections, new resource entry sections, refreshed CTAs, unified background design, and ongoing reduction of visual noise. Together, they solve a core problem: enabling visitors to understand what HAMi is and why it’s worth exploring further within seconds.

Architecture Diagrams

Key diagrams were redrawn for clearer technical communication. This helps users grasp HAMi’s role in AI infrastructure.

Figure 1: HAMi website homepage architecture diagram

For HAMi, this change is critical. The community faces not just a single feature, but a set of system-level challenges involving Kubernetes, schedulers, GPU Operators, heterogeneous devices, and enterprise platforms. Improved diagrams make the website a better technical entry point.

Added Case Studies, Community, and Ecosystem Sections to Make Impact Visible

Another important direction was strengthening the “community proof” layer.

Many open source project sites fall into the trap of having complete docs, but users can’t tell if the project is truly adopted, if the community is active, or if the ecosystem is expanding. The HAMi website redesign consciously addresses this.

Figure 2: HAMi ecosystem and device support

Figure 4: HAMi contributor organizations

Blog & Reading Experience

Blog cards, lists, and metadata were unified for easier reading and sharing. Blogs are now a core communication layer.

Mobile Optimization

Navigation, card layouts, footer, and search were improved for smoother mobile browsing.

Footer & Search

Footer layout was enhanced for better navigation and credibility. Built-in search replaced unreliable external solutions, improving content accessibility.

What This Redesign Means for the HAMi Community

From screenshots, it looks like “the website looks better”. But from a community-building perspective, its significance is deeper.

First, HAMi’s external expression is more systematic.

The website is no longer just a collection of scattered pages, but is forming a complete narrative chain: users can understand project value from the homepage, capability details from docs, practical paths from blogs, and community impact from ecosystem modules.

Second, community content assets are reorganized.

Previously, valuable articles, diagrams, and explanations existed but were hard to find. Now, through homepage sections, navigation, and search refactor, these contents are more effectively connected.

Third, HAMi’s community image is more mature.

A mature open source project needs not just an active code repository, but clear, stable, and sustainable website expression. Structure, style, and usability are part of the community’s engineering capability.

Fourth, this lays the foundation for expanding case studies, adopters, contributors, and ecosystem content.

With the framework sorted, adding more case studies, collaboration entry points, or showcasing more adopters and partners will be more natural and easier for users to understand.

As a Community Contributor, My Top Three Takeaways from This Redesign

In summary, I believe this refactor got three things right:

Upgraded the website from a “content dump” to a “community gateway”.
Combined visual optimization with information architecture adjustment, not just a skin change.
Improved basic experiences like search, mobile, navigation, and footer.

These may not be as flashy as launching a new feature, but they directly impact content dissemination, user comprehension, and the project’s long-term image.

For infrastructure projects like HAMi, technical capability is fundamental, but clearly communicating, organizing, and continuously presenting that capability is also a form of infrastructure.

Summary

This HAMi documentation and website refactor is essentially an upgrade to the community’s “expression layer” infrastructure.

It improves visual and reading experience, reorganizes content, homepage narrative, search paths, mobile access, and community signal display. Homepage redesign, architecture diagram redraw, unified blog style, mobile optimization, enhanced footer, and switching from external to built-in search together constitute a true “refactor”.

Externally, it helps more people quickly understand HAMi; internally, it provides a stable platform for the community to accumulate case studies, expand the ecosystem, and serve adopters and contributors.

The website is not an accessory to the open source community, but part of its long-term influence. HAMi’s redesign is about taking this seriously.

If you’re interested in Kubernetes GPU virtualization, add me on WeChat jimmysong or scan the QR code below.

Check out the HAMi project on GitHub

GTC 2026 Eve: AI is Becoming the New Infrastructure

云原生

作者 Jimmy Song

2026年3月15日 11:34

AI is quietly reshaping the infrastructure landscape, and GTC 2026 may become a key node in this transformation.

Next week, one of the most important technology conferences in the AI industry, NVIDIA GTC 2026, will be held in San Jose, USA.

For many people, GTC is just a GPU technology conference. But if you follow the development of the AI industry over the past few years, you’ll find an interesting phenomenon:

Many important narratives about AI infrastructure are gradually taking shape at GTC.

From CUDA, DGX, to AI Factory, and most recently Jensen Huang’s proposed AI Five-Layer Cake, NVIDIA is constantly attempting to redefine the computing infrastructure of the AI era.

This is why many people call GTC:

AI’s “Woodstock.”

This year’s GTC (March 16-19) is expected to cover various levels of the AI stack, including:

AI Chips
AI Data Centers
AI Agents
Robotics
Inference Computing

According to NVIDIA’s official blog, this year’s keynote will focus on the complete AI stack from chips to applications.

If we put these signals together, we can actually see a larger trend:

AI is transforming from an “applied technology” into “infrastructure.”

The Perspective of Industrial Revolutions

From a longer time scale, the technological revolutions in human history are essentially infrastructure revolutions.

We usually divide industrial revolutions into four times.

In the table below, you can see the infrastructure corresponding to each industrial revolution:

Industrial Revolution	Infrastructure
Steam Revolution	Steam Engine
Electrical Revolution	Power Grid
Digital Revolution	Computer
Internet Era	Network

Table 1: Industrial Revolutions and Corresponding Infrastructure

First Industrial Revolution: Steam

The steam engine allowed humans to utilize mechanical power on a large scale for the first time. Production no longer relied on human or animal power, but on machines.

Second Industrial Revolution: Electricity

Electricity changed not only the source of power, but also the organization of production. Assembly lines, large-scale manufacturing, and modern industrial systems are all built on the foundation of the power grid.

Third Industrial Revolution: Computers

Computers allowed information to be processed digitally. Software became a production tool.

Fourth Industrial Revolution: Internet and Intelligence

The internet connects all computers together. Cloud computing transforms computing resources into infrastructure. And AI gives machines a certain degree of “cognitive ability.”

The True Significance of AI

If we observe these industrial revolutions, we discover a pattern:

Each industrial revolution produces a new General Purpose Infrastructure.

And AI is likely to become the next-generation infrastructure.

NVIDIA even directly stated in a recent article:

AI is essential infrastructure, like electricity and the internet.

In other words:

AI is no longer just an applied technology, but a new factor of production.

NVIDIA’s Five-Layer Cake

Recently, Jensen Huang proposed a very interesting concept: AI Five-Layer Cake.

Figure 2: AI Five Layer Cake (Image source: <a href="https://blogs.nvidia.com/blog/ai-5-layer-cake/" target="_blank" rel="noopener">NVIDIA</a>) — Figure 2: AI Five Layer Cake (Image source: NVIDIA)

AI is broken down into five layers:

Energy
Chips
AI Infrastructure
Models
Applications

This model actually illustrates one thing:

AI is a complete industrial system.

Jensen Huang even described AI at Davos as:

“One of the largest-scale infrastructure constructions in human history.”

Signals GTC 2026 May Release

This year’s GTC is expected to release several important directions.

Inference Computing

The focus of AI in the past was training. But the main load of AI in the future is likely to be Inference.

Analysts expect that by 2030, 75% of computing demand in the AI data center market will come from inference.

Agentic AI

The past AI model was:

User → Model → Answer

The Agent model is more complex:

User → Agent → Tools → Model → Action

The flowchart below shows the main interaction paths in the Agent model:

AI is no longer just answering questions, but executing tasks.

Agent Platform

Recent media reports suggest that NVIDIA may launch a new Agent platform: NemoClaw, aimed at helping enterprises deploy AI Agents.

If this project is truly released, it means NVIDIA’s stack will become the following structure:

Figure 4: NVIDIA Agent Platform Architecture

This is actually a complete AI stack.

Agents Change Computing Workloads

The emergence of Agents brings new computing workload issues.

Past AI workloads were mainly:

Training
Inference

But Agents bring a third type of workload:

Agent Workloads

The figure below shows the diverse workload types related to Agents:

The characteristic of this workload is highly fragmented. GPUs are no longer occupied for long periods, but rather face many small requests. This poses new challenges for infrastructure.

AI-Native Infrastructure

For the past few years, I’ve been thinking about a question:

What is AI-native infrastructure?

It is clearly not just “Kubernetes with GPUs.” I’m more inclined to believe it needs to possess several characteristics.

GPU as a First-Class Resource

In the cloud computing era, CPU is the core resource. In the AI era, GPU is the core resource.

Heterogeneous Computing

Real-world AI chips are not limited to NVIDIA:

NVIDIA
Ascend
Cambricon
Metax
Moore Threads

Future AI infrastructure must be able to manage heterogeneous computing.

GPU Sharing

GPU is a very expensive resource. If it cannot be shared, utilization will be very low. This is why GPU virtualization and slicing are becoming increasingly important.

AI Scheduling

AI scheduling includes not only traditional CPU and Memory, but also:

GPU
VRAM
Topology
Bandwidth

A Possible AI Tech Stack

Combining the above trends, the future AI stack may present the following structure:

This structure is very close to NVIDIA’s Five-Layer Cake.

My Judgment

Combining signals from GTC, AI Factory, Agents, and AI Five-Layer Cake, we can see a very obvious trend:

AI is rewriting computing infrastructure.

Future competition may not just be “who has the best model,” but:

Who has the best AI Infrastructure.

Just like the past few decades:

Electricity determines industrial capability
Internet determines information capability
Cloud computing determines software capability

The future may be:

AI Infrastructure determines intelligence capability.

Summary

If we stretch the time scale a bit longer, we may be in a new historical stage.

AI is no longer just a technological tool. It is becoming new infrastructure.

Just like:

Electricity
Internet
Cloud computing

And AI-native infrastructure is likely to become one of the most important technology directions for the next decade.

When GPUs Move Toward Open Scheduling: Structural Shifts in AI Native Infrastructure

云原生

作者 Jimmy Song

2026年2月13日 22:32

The future of GPU scheduling isn’t about whose implementation is more “black-box”—it’s about who can standardize device resource contracts into something governable.

Introduction

Have you ever wondered: why are GPUs so expensive, yet overall utilization often hovers around 10–20%?

Figure 2: GPU Utilization Problem: Expensive GPUs with only 10-20% utilization

This isn’t a problem you solve with “better scheduling algorithms.” It’s a structural problem - GPU scheduling is undergoing a shift from “proprietary implementation” to “open scheduling,” similar to how networking converged on CNI and storage converged on CSI.

In the HAMi 2025 Annual Review, we noted: “HAMi 2025 is no longer just about GPU sharing tools—it’s a more structural signal: GPUs are moving toward open scheduling.”

By 2025, the signals of this shift became visible: Kubernetes Dynamic Resource Allocation (DRA) graduated to GA and became enabled by default, NVIDIA GPU Operator started defaulting to CDI (Container Device Interface), and HAMi’s production-grade case studies under CNCF are moving “GPU sharing” from experimental capability to operational excellence.

This post analyzes this structural shift from an AI Native Infrastructure perspective, and what it means for Dynamia and the industry.

Why “Open Scheduling” Matters

In multi-cloud and hybrid cloud environments, GPU model diversity significantly amplifies operational costs. One large internet company’s platform spans H200/H100/A100/V100/4090 GPUs across five clusters. If you can only allocate “whole GPUs,” resource misalignment becomes inevitable.

“Open scheduling” isn’t a slogan—it’s a set of engineering contracts being solidified into the mainstream stack.

Standardized Resource Expression

Before: GPUs were extended resources. The scheduler didn’t understand if they represented memory, compute, or device types.

Figure 3: Open Scheduling Standardization Evolution

Now: Kubernetes DRA provides objects like DeviceClass, ResourceClaim, and ResourceSlice. This lets drivers and cluster administrators define device categories and selection logic (including CEL-based selectors), while Kubernetes handles the full loop: match devices → bind claims → place Pods onto nodes with access to allocated devices.

Even more importantly, Kubernetes 1.34 stated that core APIs in the resource.k8s.io group graduated to GA, DRA became stable and enabled by default, and the community committed to avoiding breaking changes going forward. This means the ecosystem can invest with confidence in a stable, standard API.

Standardized Device Injection

Before: Device injection relied on vendor-specific hooks and runtime class patterns.

Now: The Container Device Interface (CDI) abstracts device injection into an open specification. NVIDIA’s Container Toolkit explicitly describes CDI as an open specification for container runtimes, and NVIDIA GPU Operator 25.10.0 defaults to enabling CDI on install/upgrade—directly leveraging runtime-native CDI support (containerd, CRI-O, etc.) for GPU injection.

This means “devices into containers” is also moving toward replaceable, standardized interfaces.

HAMi: From “Sharing Tool” to “Governable Data Plane”

On this standardization path, HAMi’s role needs redefinition: it’s not about replacing Kubernetes—it’s about turning GPU virtualization and slicing into a declarative, schedulable, governable data plane.

Data Plane Perspective

HAMi’s core contribution expands the allocatable unit from “whole GPU integers” to finer-grained shares (memory and compute), forming a complete allocation chain:

Device discovery: Identify available GPU devices and models
Scheduling placement: Use Scheduler Extender to make native schedulers “understand” vGPU resource models (Filter/Score/Bind phases)
In-container enforcement: Inject share constraints into container runtime
Metric export: Provide observable metrics for utilization, isolation, and more

This transforms “sharing” from ad-hoc “it runs” experimentation into engineering capability that can be declared in YAML, scheduled by policy, and validated by metrics.

Scheduling Mechanism: Enhancement, Not Replacement

HAMi’s scheduling doesn’t replace Kubernetes—it uses a Scheduler Extender pattern to let the native scheduler understand vGPU resource models:

Filter: Filter nodes based on memory, compute, device type, topology, and other constraints
Score: Apply configurable policies like binpack, spread, topology-aware scoring
Bind: Complete final device-to-Pod binding

This architecture positions HAMi naturally as an execution layer under higher-level “AI control planes” (queuing, quotas, priorities)—working alongside Volcano, Kueue, Koordinator, and others.

Figure 4: HAMi Scheduling Architecture (Filter → Score → Bind)

Production Evidence: From “Can We Share?” to “Can We Operate?”

CNCF public case studies provide concrete answers: in a hybrid, multi-cloud platform built on Kubernetes and HAMi, 10,000+ Pods run concurrently, and GPU utilization improves from 13% to 37% (nearly 3×).

Figure 5: CNCF Production Case Studies: Ke Holdings 13%→37%, DaoCloud 80%+ utilization, SF Technology 57% savings

Here are highlights from several cases:

Case Study 1: Ke Holdings (February 5, 2026)

Environment: 5 clusters spanning public and private clouds
GPU models: H200/H100/A100/V100/4090 and more
Architecture: Separate “GPU clusters” for large training tasks (dedicated allocation) vs “vGPU clusters” with HAMi fine-grained memory slicing for high-density inference
Concurrent scale: 10,000+ Pods
Outcome: Overall GPU utilization improved from 13% to 37% (nearly 3×)

Case Study 2: DaoCloud (December 2, 2025)

Hard constraints: Must remain cloud-native, vendor-agnostic, and compatible with CNCF toolchain
Adoption outcomes:
- Average GPU utilization: 80%+
- GPU-related operating cost reduction: 20–30%
- Coverage: 10+ data centers, 10,000+ GPUs
Explicit benefit: Unified abstraction layer across NVIDIA and domestic GPUs, reducing vendor dependency

Case Study 3: Prep EDU (August 20, 2025)

Negative experience: Isolation failures in other GPU-sharing approaches caused memory conflicts and instability
Positive outcome: HAMi’s vGPU scheduling, GPU type/UUID targeting, and compatibility with NVIDIA GPU Operator and RKE2 became decisive factors for production adoption
Environment: Heterogeneous RTX 4070/4090 cluster

Case Study 4: SF Technology (September 18, 2025)

Project: EffectiveGPU (built on HAMi)
Use cases: Large model inference, test services, speech recognition, domestic AI hardware (Huawei Ascend, Baidu Kunlun, etc.)
Outcomes:
- GPU savings: Large model inference runs 65 services on 28 GPUs (37 saved); test cluster runs 19 services on 6 GPUs (13 saved)
- Overall savings: Up to 57% GPU savings for production and test clusters
- Utilization improvement: Up to 100% GPU utilization improvement with GPU virtualization
Highlights: Cross-node collaborative scheduling, priority-based preemption, memory over-subscription

These cases demonstrate a consistent pattern: GPU virtualization becomes economically meaningful only when it participates in a governable contract—where utilization, isolation, and policy can be expressed, measured, and improved over time.

Strategic Implications for Dynamia

From Dynamia’s perspective (and as VP of Open Source Ecosystem), the strategic value of HAMi becomes clear:

Two-Layer Architecture: Open Source vs Commercial

HAMi (CNCF open source project): Responsible for “adoption and trust,” focused on GPU virtualization and compute efficiency
Dynamia enterprise products and services: Responsible for “production and scale,” providing commercial distributions and enterprise services built on HAMi

Figure 6: Dynamia Dual Mechanism: Open Source vs Commercial

This boundary is the foundation for long-term trust—project and company offerings remain separate, with commercial distributions and services built on the open source project.

Global Narrative Strategy

The internal alignment memo recommends a bilingual approach:

First layer: Lead globally with “GPU virtualization / sharing / utilization” (Chinese can directly use “GPU virtualization and heterogeneous scheduling,” but English first layer should avoid “heterogeneous” as a headline)

Second layer: When users discuss mixed GPUs or workload diversity, introduce “heterogeneous” to confirm capability boundaries—never as the opening hook

Core anchor: Maintain “HAMi (project and community) ≠ company products” as the non-negotiable baseline for long-term positioning

The Right Commercialization Landing

DaoCloud’s case study already set vendor-agnostic and CNCF toolchain compatibility as hard constraints, framing vendor dependency reduction as a business and operational benefit—not just a technical detail. Project-HAMi’s official documentation lists “avoid vendor lock” as a core value proposition.

In this context, the right commercialization landing isn’t “closed-source scheduling”—it’s productizing capabilities around real enterprise complexity:

Systematic compatibility matrix
SLO and tail-latency governance
Metering for billing
RBAC, quotas, multi-cluster governance
Upgrade and rollback safety
Faster path-to-production for DRA/CDI and other standardization efforts

Forward View: The Next 2–3 Years

My strong judgment: over the next 2–3 years, GPU scheduling competition will shift from “whose implementation is more black-box” to “whose contract is more open.”

The reasons are practical:

Hardware Form Factors and Supply Chains Are Diversifying

OpenAI’s February 12, 2026 “GPT‑5.3‑Codex‑Spark” release emphasizes ultra-low latency serving, including persistent WebSockets and a dedicated serving tier on Cerebras hardware
Large-scale GPU-backed financing announcements (for pan-European deployments) illustrate the infrastructure scale and financial engineering surrounding accelerator fleets

These signals suggest that heterogeneity will grow: mixed accelerators, mixed clouds, mixed workload types.

Low-Latency Inference Tiers Will Force Systematic Scheduling

Low-latency inference tiers (beyond just GPUs) will force resource scheduling toward “multi-accelerator, multi-layer cache, multi-class node” architectural design—scheduling must inherently be heterogeneous.

Open Scheduling Is Risk Management, Not Idealism

In this world, “open scheduling” isn’t idealism—it’s risk management. Building schedulable governable “control plane + data plane” combinations around DRA/CDI and other solidifying open interfaces, ones that are pluggable, multi-tenant governable, and co-evolvable with the ecosystem—this looks like the truly sustainable path for AI Native Infrastructure.

The next battleground isn’t “whose scheduling is smarter”—it’s “who can standardize device resource contracts into something governable.”

Conclusion

When you place HAMi 2025 back in the broader AI Native Infrastructure context, it’s no longer just the year of “GPU sharing tools”—it’s a more structural signal: GPUs are moving toward open scheduling.

The driving forces come from both ends:

Upstream: Standards like DRA/CDI continue to solidify
Downstream: Scale and diversity (multi-cloud, multi-model, even accelerators beyond GPUs)

For Dynamia, HAMi’s significance has transcended “GPU sharing tool”: it turns GPU virtualization and slicing into declarative, schedulable, measurable data planes—letting queues, quotas, priorities, and multi-tenancy actually close the governance loop.

Core Model Overview

云原生

作者 Jimmy Song

2026年2月10日 21:56

The Yin-Yang - Five Elements - Yun - Qi Model views AI Infrastructure as an organic whole, revealing its operational mechanisms from four dimensions. Each layer focuses on different fundamental questions:

Four-Layer Model

Layer	Name	Focus Question
Yin-Yang	State Layer	The system’s internal unity of opposites tension structure, revealing how dual elements like performance vs. constraints, innovation vs. governance coexist
Five Elements	Role Layer	Five basic role elements in the system and their collaborative relationships, breaking down complex infrastructure into data, models, compute, platforms, and hardware
Yun	Time Layer	The development stage the system is in and its cyclical patterns, describing the evolution cycle from exploration to platformization, then scaling and rebalancing
Qi	Flow Layer	The effective “field” of flow within the system, characterizing the conduction and feedback of signals and resources, reflecting the overall smoothness of operation

Table 1: Four-Layer Model

Model Interactions

The four-layer model is not isolated but an interconnected organic whole:

The tension of Yin-Yang permeates the dynamic balance of Five Elements
The development of Five Elements roles is constrained by their Yun stage
The flow of Qi connects the above elements into a self-adaptive cyclic system

The overview diagram below illustrates each layer of the model and their interactions:

Figure 1: AI Infrastructure ‘Yin-Yang - Five Elements - Yun - Qi’ Model Overview. The Yin-Yang layer embodies the system’s internal tension and unity of opposites, the Five Elements layer defines core role elements, the Yun layer describes system stage cycles, and Qi as a flow element permeates and drives the entire system.

Model Application Value

This four-layer model provides a unique perspective for the design, operations, and governance of AI Infrastructure:

Holistic Cognitive Framework: Transcend the limitations of single technical metrics to grasp system state as a whole
Dynamic Balance Thinking: Understand unity of opposites relationships and avoid extremes
Evolutionary Stage Awareness: Grasp the system’s development stage and act in accordance with the situation
Flow Insights: Focus on energy flow within the system to anticipate problems

Next, we will delve into the connotation, engineering mapping, and mechanism of each layer.

The Yin-Yang Layer: Dynamic Balance of System States

云原生

作者 Jimmy Song

2026年2月10日 21:56

Yin-Yang is originally a fundamental concept in Chinese philosophy, representing two opposing yet interdependent forces present in all things in the universe. Everything in the world can be classified as either Yin or Yang, and their continuous movement and change generate the various transformations we observe. In the context of systems, Yin-Yang represents the unity of opposites through tension—a pair of attributes or tendencies that pull against yet depend on each other.

Three Typical Pairs of Yin-Yang Tensions

In AI infrastructure, we identify three typical pairs of Yin-Yang tensions:

Expansion ↔ Constraint

Expansion ↔ Constraint: The tension between growth trends and limiting forces.

Yang (Expansion): System expansion speed, such as continuously adding tasks and scaling resources
Yin (Constraint): Limiting forces, such as cost controls, regulatory constraints, and hardware limits

System expansion speed and constraint intensity always coexist. For example, continuously adding tasks and scaling resources in GPU clusters (the Yang of expansion) is constrained by costs, regulations, or hardware limits (the Yin of constraint).

Imbalance manifestations:

Pursuing expansion without regard for constraints → Resource contention and crashes
Excessive constraint → Stifling system vitality

Innovation ↔ Governance

Innovation ↔ Governance: The tension between creative capability and control requirements.

Yang (Innovation): Technical innovation, introduction of new features
Yin (Governance): Security reviews, rule-making

The faster technical innovation progresses, the more easily governance gaps are exposed. For example, introducing new Agent features (innovation, Yang) may outpace security reviews and rule-making (governance, Yin), leading to potential risks.

Imbalance manifestations:

Innovation outpaces governance → Potential security risks
Excessively strict governance → Slowing innovation momentum

Speed ↔ Stability

Speed ↔ Stability: The tension between performance advancement and reliable operation.

Yang (Speed): Performance improvements, increased throughput
Yin (Stability): Reliable operation, system stability

When we pursue speed improvements single-mindedly, the cost to stability will eventually manifest. For example, pushing GPU utilization to the limit during model training (speed, Yang) easily leads to more frequent failures or delays (decline in stability, Yin).

Imbalance manifestations:

Extreme pursuit of speed → Decline in stability
Excessive conservatism → Performance waste

The Art of Yin–Yang Balance

The Yin–Yang poles described above are not simple trade-offs where you choose one and sacrifice the other, but rather inherent relationships of unity of opposites in systems. Both Yin and Yang sides are opposed yet complementary, neither can be dispensed with:

Expansion without constraints is difficult to sustain, constraints without expansion lose meaning

As the ancient saying goes, “One Yin and one Yang constitute the Way” (一阴一阳之谓道). Balancing Yin and Yang is the “Way” of healthy system operation. For architects, the key lies in:

Insight into dominant tensions: Determine which pair of tensions is currently dominant
Introducing the opposite: Introduce the complementary side at the right time to restore balance
Dynamic adjustment: Dynamically transform based on changes in system environment and stage

Practical Cases

Case: GPU Cluster Expansion

When the cluster is in a state of rapid expansion (Yang exuberant, Yin deficient):

✓ Add scheduling policies and resource quotas (supplement Yin)
✓ Establish cost control mechanisms (supplement Yin)
✗ Do not pursue expansion speed single-mindedly

Case: Agent Feature Innovation

When introducing new Agent features:

✓ Simultaneously establish monitoring and sandboxing mechanisms (supplement Yin)
✓ Improve security review processes (supplement Yin)
✗ Do not let innovation outpace governance

Case: Model Training Performance Optimization

When optimizing model training performance:

✓ Simultaneously strengthen fault tolerance mechanisms and testing (supplement the Yin of stability)
✓ Set performance baselines and rollback mechanisms (supplement Yin)
✗ Do not infinitely compress fault tolerance time

Dynamic Transformation of Yin–Yang States

It’s important to note that Yin–Yang states are not static and unchanging, but dynamically transform with system environment and stage.

The same capability may transform from an advantage to a risk at different stages

For example, a “rapid development” strategy that drives rapid iteration during the startup stage, if applied without restraint during the scaling stage, can instead become a major threat to stability.

The analysis of the Yin–Yang layer reminds us to constantly pay attention to the ebb and flow of these opposing forces, and to keep the system in a state of elastic tension through adjustments, rather than snapping or becoming slack and ineffective.

Five Elements Layer: Classification and Collaboration of System Roles

云原生

作者 Jimmy Song

2026年2月10日 21:56

Five Elements (Wǔxíng, Five Elements or Five Phases) theory divides everything in the world into five basic elements: Wood, Fire, Earth, Metal, Water. Each element represents a fundamental attribute or functional role, with the five elements generating and overcoming each other in an endless cycle.

In AI infrastructure, we use “Five Elements” to characterize the system’s five core elements and their responsibilities:

Engineering Mapping of Five Elements

Five Elements	Symbol	Meaning	Engineering Correspondence
Water	🌊	Flow and containment	Data flow and quality: data pipelines, data assets, and quality control
Wood	🌲	Growth and creation	Model growth and capability expansion: model architecture iteration, parameter scale expansion
Fire	🔥	Energy and execution	Compute conversion and work efficiency: GPU/TPU computing, job scheduling efficiency
Earth	🏔️	Support and stability	Platform support and orchestration governance: distributed coordination, middleware, scheduling systems
Metal	⚙️	Strength and standardization	Hardware constraints and physical boundaries: GPU/CPU performance, storage capacity, network bandwidth

Table 1: Engineering Mapping of Five Elements

Water – Data Flow and Quality

Corresponds to data pipelines, data assets, and quality control in the system.

Water symbolizes flow and containment, analogous to the circulation and nourishing role of data in the system, including:

Training data acquisition
Real-time data input
Feedback signal transmission
Data cleaning and quality assurance

Wood – Model Growth and Capability Expansion

Corresponds to the evolution and growth of machine learning models and algorithms.

Wood represents growth and creation, mapped to:

Model architecture iteration
Parameter scale expansion
Cultivation of new capabilities
Algorithm optimization and improvement

Fire – Compute Conversion and Work Efficiency

Corresponds to computing processes and the utilization of compute resources.

Fire symbolizes energy and execution, reflected as:

Using GPU/TPU and other compute resources for calculation
Converting electrical energy into model training and inference work
Parallel computing capability
Job scheduling efficiency

Earth – Platform Support and Orchestration Governance

Corresponds to the support and governance capabilities of the platform layer.

Earth represents support and stability, analogous to:

Infrastructure platform support for upper-layer applications
Distributed system coordination and orchestration
Middleware services
Scheduling systems and policy management
Permission systems, service quality assurance

Metal – Hardware Constraints and Physical Boundaries

Corresponds to underlying hardware and system hard limits.

Metal represents strength and standardization, mapped to:

GPU/CPU hardware performance
Storage capacity
Network bandwidth
Physical conditions and hard rules (power consumption, safety specifications, etc.)

Five Elements Generation Relationships

The Five Elements form a positive cycle through “generation” relationships:

Data (Water) spawns model growth (Wood), model requirements stimulate compute investment (Fire), compute development drives platform thickening (Earth), platform capabilities utilize hardware to push the boundaries of (Metal), and hardware progress in turn supports greater data acquisition (Water)

Figure 1: Five Elements generation relationship diagram. Water generates Wood, Wood generates Fire, Fire generates Earth, Earth generates Metal, Metal generates Water, representing the mutually reinforcing cycle between data, models, compute, platforms, and hardware.

Five Elements Overcoming Relationships

At the same time, overcoming relationships also exist among the Five Elements, meaning when one element is too strong or imbalanced, it will suppress or weaken another element:

Wood overcomes Earth: Excessive model expansion increases the burden on the platform (Earth), potentially even crushing the existing architecture
Earth overcomes Water: Overly heavy platforms and rules will hinder the free flow of data (Water)
Water overcomes Fire: Data bottlenecks will limit the performance of compute
Fire overcomes Metal: Excessive compute demand may break through hardware (Metal) limits
Metal overcomes Wood: Strict hardware and rule limitations will curb the expansion of models (Wood)

Figure 2: Five Elements generation and overcoming relationship diagram. Dashed arrows indicate overcoming relationships, reflecting the system’s internal checks and balances mechanism: any element becoming excessively strong will constrain another element.

Figure 3: Five Elements generation and overcoming relationship diagram. Dashed arrows indicate overcoming relationships, reflecting the system’s internal checks and balances mechanism: any element becoming excessively strong will constrain another element.

Five Elements Balance Diagnosis

Through the Five Elements model, engineering teams can systematically check the role completeness and balance of infrastructure.

Common Imbalance Patterns

Imbalance Pattern	Manifestation	Consequence	Solution
Strong Wood, Weak Water	Focus on model algorithm iteration, neglect data quality	Model performance hits bottlenecks	Strengthen data pipelines and quality control
Strong Metal, Weak Earth	Stack hardware, insufficient platform governance capability	Poor resource utilization, lack of vitality	Improve platform governance and scheduling
Vigorous Fire, Broken Wood	Large compute investment, models can’t keep up	Resource waste	Optimize model architecture, improve compute utilization efficiency

Table 2: Common Imbalance Patterns

Balance Principles

Successful large-scale systems require coordinated cooperation of all five elements

Let each of the five elements fulfill its duties in their respective roles
Maintain generation as primary, overcoming as secondary
Prevent any side from excessive expansion or shrinkage
Regularly check the balance state of Five Elements

Only by letting the five elements fulfill their respective roles and mutually promote each other, while preventing any side from excessive expansion or shrinkage, can the entire system maintain robustness and evolutionary capability.

The Yun Layer: Stages and Cycles of System Evolution

云原生

作者 Jimmy Song

2026年2月10日 21:56

Yun (运) here refers to the developmental stages and temporal rhythms experienced by a system, which can be understood as the lifecycle cycles or “fortune” of infrastructure.

Large-scale infrastructure is not static but evolves cyclically through the Exploration Period, Platform Period, Scale Period, and Rebalancing Period, with each stage having its primary contradictions and tasks.

Below are the four evolutionary stages.

Exploration Period (Initial Stage)

Characteristics: High variance, low structure, rapid trial and error

At this stage, new technologies and requirements emerge constantly, system architecture is loose, and diverse experiments coexist.

Primary Tasks:

Explore effective paths
Rapidly validate model and functional directions
Collect data and preliminary stability signals

Five Elements Characteristics: Wood and Fire in Command

Model innovation (Wood) and computing experimentation (Fire) are core drivers
Expansion (Yang) outweighs constraints (Yin)

Architecture Strategy:

✓ Tolerate some chaos
✓ Encourage innovation and iteration
✓ Focus on collecting data and preliminary stability signals
✗ Don’t prematurely introduce heavy processes and restrictions

Platform Period (Growth Stage)

Characteristics: Standardization emerges, interfaces and processes converge

After exploration, the system enters a stage of integration and regulation, beginning to establish unified platforms, standard interfaces, and governance processes, consolidating scattered results into platform capabilities.

Primary Tasks:

Establish unified platforms
Define standard interfaces
Consolidate governance processes

Five Elements Characteristics: Fire Generates Earth

Successful practices in computing and functionality (Fire) give rise to platform support requirements (Earth)
Governance and standards gradually strengthen

Architecture Strategy:

✓ Extract common requirements
✓ Build support platforms (Yin increases)
✓ Lay the foundation for next-stage scaling
✗ Don’t remain in disordered exploration

Scale Period (Mature Stage)

Characteristics: Efficiency, throughput, and cost become the main battlefield

The system is deployed at scale, and focus shifts to optimizing efficiency and costs, improving throughput and reliability.

Primary Tasks:

Optimize efficiency
Improve throughput
Reduce costs
Ensure reliability

Five Elements Characteristics: Heavy Earth Breaks Wood

Platforms (Earth) and hard constraints begin to dominate
Overly idealistic model expansion (Wood) will encounter setbacks from realistic conditions

Architecture Strategy:

✓ Strengthen monitoring and automated operations
✓ Control overly strong “Yang” through governance means
✓ Ensure robust system operation
✗ Don’t continue with startup-era casual practices

Rebalancing/Substitution Period (Renewal Stage)

Characteristics: Old structures are corrected or replaced by new structures

When the previous stage’s patterns reach their limits, the system either enters self-correction by introducing new elements to rebalance, or gets disrupted and replaced by a new paradigm.

Primary Tasks:

Introduce new elements to rebalance
Or accept substitution by a new paradigm

Five Elements Characteristics: Metal and Water Rise Again

Suppressed hardware/rule innovations (Metal) and new data potentials (Water) rise again
Driving system transformation

Architecture Strategy:

✓ Be forward-looking, dare to break through
✓ Transition smoothly, avoid severe volatility
✗ Don’t cling to the status quo

Evolutionary Cycle

The above stages form a cyclical pattern, where the endpoint of each stage is also the starting point of the next ↻.

Figure 1: The “Yun” cycle of AI infrastructure evolution. Systems start from the exploration period, undergo platform period standardization, enter the scale period for efficiency optimization, and ultimately move toward a new cycle of rebalancing or substitution.*

The Art of Following the Momentum

A mature infrastructure organization should be able to determine its current stage based on internal and external signals and adjust its strategy accordingly.

If stage transitions are ignored or excessively rushed, the system will experience disturbances or even crises

Error Examples

Erroneous Behavior	Manifestation	Consequence
Pulling Up Seedlings to Help Them Grow	Managing systems still in exploration period as scaled systems, prematurely suppressing change	Stifling innovation
Going Against the Momentum	Remaining in disordered exploration when it’s time to enter the platform period	Missing the window for structured growth and creating hidden risks
Clinging to the Status Quo	Unwilling to change when rebalancing period is needed	System rigidity and aging

Table 1: Development Stage Characteristics

Stage Assessment Checklist

Through the “Yun” layer perspective, teams can examine the current macro stage:

Are we validating new concepts or expanding our achievements?
What is the system’s primary contradiction?
When might the next stage arrive?
Does our strategy align with the current stage?

Example Questions:

Are we in the exploration period?
- If yes → Focus on rapid trial and error and validation
- If no → Consider whether to enter the platform period
Does our system need standardization?
- If yes → Enter platform period, establish platforms and standards
- If no → Continue exploration

Qi Layer: Effective System Flow and Pressure Fields

云原生

作者 Jimmy Song

2026年2月10日 21:56

Qi (气) in Chinese culture refers to the energy and flow field that permeates all things. In AI infrastructure, we borrow the concept of “Qi” to describe the effective flow and pressure distribution within systems.

This includes the circulation of data, tasks, and signals throughout the system, as well as how various explicit or implicit system pressures accumulate, propagate, and release.

The Essence of Qi: Overall State of Affairs

Unlike traditional single-point metric monitoring, the concept of “Qi” reminds us to focus on the overall state of affairs:

Signals are not isolated events, but rather gather and flow like a field

For example:

A sudden spike in GPU utilization may not be abnormal
But if multiple metrics (job queue length, response latency, memory usage, etc.) show a simultaneous trend of increase and persistence → this indicates a change in the “Qi field”
This signals the system entering a high-pressure state

This signal field manifests as the gathering and stretching of Qi, indicating the accumulation of some form of system tension.

Two States of Qi

Qi Flow: System Active

When all elements coordinate well, data and instructions flow smoothly, producing value efficiently:

Processing rates across all stages are basically matched
No long-term backlogs or idle resources
Timely system responses
Balanced resource utilization

Qi Stagnation: System Pathological

If a bottleneck or imbalance occurs somewhere, Qi’s flow is obstructed, causing local pressure to surge:

Jobs queue for long periods
CPU/GPU long-term idle or 100% utilization
Serious message queue backlog
Frequent anomaly alerts

Ultimately, this may trigger failures or performance collapse at weak points.

Qi’s Flow Path

To intuitively understand Qi’s flow path, we can view the system as a closely connected network:

Figure 1: Diagram of system ‘Qi’ flow path. Data (Water) Qi enters Model (Wood), triggering Computing Power (Fire) operation, coordinated via Platform (Earth), executed on Hardware (Metal), producing results that feed back to the data layer, forming a closed loop.

Qi’s Cycle:

Data (Water) Qi enters Model (Wood)
Drives Computing Power (Fire) to operate
Coordinated via Platform (Earth)
Executes computation on Hardware (Metal)
Outputs results, producing new data or signals
Feeds back into the data pool (Water)
Cycle repeats

Two Forms of Qi

Healthy Flow

Qi circulates ceaselessly among the five elements, maintaining system functionality:

If every step flows smoothly → system operates smoothly
If any step is obstructed → Qi flow slows or even reverses, damaging system performance and stability

Pressure Propagation

Qi refers not only to healthy flow, but also to pressure propagation:

Example: Data Inflow Surge

Data inflow surges but model processing capacity cannot keep up
Unprocessed data continuously accumulates
Manifests as excessive pressure in the data layer (Water)
Leading to suppression of computing power performance (Fire weakens)

Example: Hardware Resource Exhaustion

Hardware (Metal) resources exhausted
Computing requests cannot be satisfied
Obstructed Qi transforms into queuing pressure
Feeds back to platform (Earth) scheduling layer and user experience

Application of Qi Layer in Operations

Through the lens of “Qi”, operations and architecture teams can more sensitively detect sub-optimal system states:

Not Just Whether There’s a Problem, But How It’s Trending

Qi State	Manifestation	Warning Significance
Stagnation Emerging	Latency jitter gradually worsening	System entering sub-stable state, needs 疏导
Flow Obstruction	Request failure rate rising, retries increasing	某环节阻塞，needs investigation
Qi Scattering	Metrics fluctuating severely, irregular	System severely imbalanced, needs overall adjustment
Qi Deficiency	Resource utilization long-term low	Configuration unreasonable, needs optimization

Table 1: Qi State and Warning Significance

Qi Disorder Precedes Major Incidents

Latency jitter gradually worsening → signals system entering sub-stable state
If no measures are taken to resolve (scaling resources, optimizing algorithms, or rate limiting) → may evolve to complete failure
Agent task interaction rhythm (Qi) slows or stops → may indicate poor communication between agents or deadlock

Strategies for Guiding Qi Flow

Maintaining smooth Qi flow requires building resilience:

Architecture Level

Peak shaving and valley filling mechanisms: Absorb 突发流量
Message queue backpressure protection: Prevent pressure backflow
Elastic buffer design: Reserve margin to handle impacts

Strategy Level

Slack capacity: Maintain certain redundancy
Elastic scaling strategies: Dynamically adjust resources
Rate limiting and degradation mechanisms: Protect core functionality

Agent System Special Attention

Monitor task queues and communication latency
Ensure information flow (Qi) between agents is unobstructed
Introduce coordinator agents or reduce concurrency when necessary to smooth Qi flow

Qi Layer Monitoring Practices

Establish system-wide observability:

Monitoring Dimension	Focus	Tool Examples
Traffic Distribution	Request flow across stages	Distributed Tracing
Queue Backlog	Queue length trends	Message Queue Monitoring
Resource Utilization	CPU/GPU/Memory/Storage	Prometheus + Grafana
Latency Distribution	P50/P95/P99 latency	APM Tools
Anomaly Trends	Error rate, retry rate changes	Log Aggregation Analysis

Table 2: Qi Layer Monitoring Dimensions

The Qi layer provides an effective liquidity metric, helping us pulse-check whether the system’s “blood and Qi” are abundant and flowing smoothly

Summary

Qi’s operation can be understood as whether the system’s “meridians” are unobstructed:

Qi flow means system active: Data and instructions flow smoothly, producing value efficiently
Qi stagnation means system pathological: Flow obstructed, local pressure surges, ultimately triggering failures

Just as in Traditional Chinese Medicine’s four examination methods, by observing “Qi’s” operation, we can predict the trajectory of system problems and apply targeted remedies.

System Diagnosis Principles: Criteria for Health Status

云原生

作者 Jimmy Song

2026年2月10日 21:56

To maintain the long-term healthy evolution of AI infrastructure, post-mortem summaries are far from sufficient. We need a set of system diagnosis principles to detect hidden risks early and correct deviations.

Based on the Yin-Yang Five Elements Yun model, diagnosis can be conducted from the following five dimensions:

Five-Dimensional Diagnosis Framework

Five Elements Balance Check

Assess the current status of five aspects: Data (Water), Models (Wood), Compute (Fire), Platform (Earth), and Hardware (Metal).

Diagnosis Method

Checklist:

Can data pipelines keep up with demands? (Water)
Are model capabilities fully utilized? (Wood)
Are compute resources effectively used? (Fire)
Can the platform support current load? (Earth)
Is hardware becoming a bottleneck? (Metal)

Identify Problems

Problem Type	Manifestation	Solution
Short Board	One element significantly weaker than others	Prioritize strengthening that element
Overload	One element consumes excessive resources or frequently becomes a bottleneck	Introduce limits or expand other elements to share pressure

Table 1: Problem Types and Solutions

Typical Symptoms

Water Level Too Low: Data pipelines always lag behind training needs → Replenish data processing capacity
Metal Overload: Hardware often runs at full capacity or even triggers limit alarms → Expand capacity or impose constraints on upper layers

Most failures do not stem from missing components, but from long-term role imbalance

Qi Flow Smoothness Check

Analyze whether Qi flows smoothly through the system via full-link monitoring.

Diagnosis Method

Key Metrics:

Latency distribution of key processes
Queue backlogs
Resource utilization curves

Qi Smooth vs. Qi Not Smooth

State	Characteristics
Qi Smooth	Processing rates across stages basically match, without long-term backlogs or idle resources
Qi Not Smooth	One stage remains a bottleneck for long periods, or large amounts of resources sit idle

Table 2: Qi Flow: Smooth vs Obstructed

Diagnosis Points

Distinguish temporary fluctuations from persistent trends: brief peaks don’t necessarily indicate Qi blockage, but persistent deviations must be addressed

Tool Support:

Dashboards and automated alerts
Timely capture of “stagnant Qi” locations
Further investigation of causes (which Five Elements imbalance corresponds)

Yin-Yang Dynamics Check

Assess whether current strategy and state are Yang Excess Yin Deficiency or Yin Excess Yang Deficiency.

Diagnosis Method

Qualitative Analysis:

Look at whether recent architecture decisions overly favor one extreme
Have you been continuously expanding and adding new features while ignoring stability?
Or conversely, multiple layers of approval and strict constraints but lack innovation momentum?

Quantitative Metrics:

Metric	Yang Excess	Yin Excess
Change Frequency	Extremely high	Extremely low
Incident Rate	Frequent	Extremely low but no change
Release Rhythm	Continuous	Long-term stagnation

Table 3: Yin-Yang Status

Balance Strategy

State	Symptoms	Solution
Yang Excess Yin Deficiency	Frequent changes with frequent incidents	Pause releases, focus on addressing hazards (replenish Yin)
Yin Excess Yang Deficiency	Long-term no change and stagnation	Introduce challenges and innovation (add Yang)

Table 4: Balance Strategies

Yun Alignment Check

Determine whether the organization’s actions match the system’s current stage, preventing counter-Yun operation.

Diagnosis Method

Combine Business Development and Technical Maturity:

Error Pattern	Manifestation	Consequences
Premature Standardization	Spending 大量精力 on process management and cost optimization for emerging projects	These are typically scale stage concerns, but the project is still in exploration stage
Counter-Yun Exploration	Frequently changing underlying architecture for widely used platforms without rigorous testing	Inconsistent with scaling stage

Table 5: Error Patterns

Stage-Strategy Reference Table

Stage	Should Focus On	Should Not Do
Exploration Stage	Diversity, flexibility, rapid trial and error	Premature pursuit of efficiency
Platform Stage	Standardization, process norms	Frequent arbitrary changes
Scale Stage	Optimization, stability, efficiency	Still growing wildly
Rebalancing Stage	Transformation, breakthrough, innovation	Clinging to the past

Table 6: Stage-Strategy Mapping

Checklist:

Which stage are we currently in?
Do our actions match the stage?
Do we need to adjust strategy?

When discovering actions don’t match the stage, immediately adjust strategy to avoid working at cross-purposes

Yang Runaway Warning

Pay special attention to whether there are signs of Yang state runaway in the system.

What is Yang Runaway?

Exponential explosion or collapse risk caused by unconstrained positive feedback.

Typical Scenarios

Scenario	Mechanism	Risk
Service Call Volume Surge	Bug or abuse → Resource strain → Queuing and retry storms → Further increase in calls	Resource exhaustion
Training Task Self-Replication	Tasks unlimitedly self-replicate to accelerate → Cluster resource exhaustion	System collapse

Table 7: Typical Scenarios

Diagnosis Signals

A metric shows exponential explosive growth
Lack of slowing mechanisms
Formation of vicious cycles

Response Strategy

Strategy	Means	Effect
Establish Hard Limits	Metal’s constraints	Immediate shutdown
Introduce Negative Feedback	Earth’s governance (rate limiting, quotas)	Braking and deceleration
Break Positive Feedback Chain	Activate emergency plan	Pull back to steady state

Table 8: Response Strategies

When discovering a metric showing exponential explosive growth without slowing mechanisms, intervene immediately

Diagnosis Implementation Process

Regular Diagnosis Mechanism

Recommend establishing a periodic diagnosis process:

Diagnosis Meeting Agenda

Fixed Session of Weekly Operations Review Meeting:

Check Five Elements scores for each module
Browse global Qi flow diagram
Analyze Yin-Yang dynamics
Discuss current Yun

This systematic examination makes hidden risks 无处遁形，thus achieving prevention before problems occur

Diagnosis Action Matrix

Diagnosis Result	Action Recommendation
Five Elements: One Element Too Weak	Concentrate resources to strengthen the weakness
Five Elements: One Element Overloaded	Expand capacity or introduce constraints
Qi Stagnation at One Stage	Clear bottlenecks, optimize processes
Yang Excess Yin Deficiency	Strengthen governance and stability mechanisms
Yin Excess Yang Deficiency	Activate innovation and boost vitality
Counter-Yun Operation	Adjust strategy and go with the flow
Yang Runaway Warning	Immediate intervention, break positive feedback

Table 9: Diagnosis Action Matrix

Summary

Through the above diagnosis principles, architects and operations teams can periodically take the pulse of infrastructure like TCM pulse diagnosis.

When diagnosis indicates imbalance in some aspect, immediately prescribe remedy based on the theory: replenish what needs replenishing, purge what needs purging.

Long-term adherence will keep the system on a healthy evolutionary trajectory.

Conclusion and Outlook

云原生

作者 Jimmy Song

2026年2月10日 21:56

This paper systematically presents the four-layer model of “Yin-Yang - Five Elements - Yun - Qi” for AI infrastructure, providing a comprehensive cognitive map from theory to practice.

Review of Theoretical Model

Through four dimensions, we have constructed a global framework for understanding AI infrastructure:

Layer	Core Value	Key Insights
Yin-Yang	Understanding the tension and balance within systems	Expansion and constraint, innovation and governance, speed and stability—these three are opposites yet unified, all indispensable
Five Elements	Organizing the fundamental role elements of systems	Data, models, computing power, platforms, hardware—these five generate and restrain each other in endless cycles
Yun	Grasping the periodic patterns of system evolution	Exploration phase, platform phase, scale phase, rebalancing phase—act in accordance with the trends
Qi	Insight into the flow state of system operation	When Qi flows, the system is active; when Qi stagnates, the system becomes pathological

Table 1: Layer, Core Value, and Key Insights

More importantly, we have demonstrated how this theory combining Eastern wisdom with engineering practice can provide insights and guidance for real-world problems such as GPU scheduling, Agent Runtime, and platform governance.

Core Value of the Model

Holistic View

Traditional fragmented perspectives often see trees but not the forest, making it difficult to provide timely warnings of systemic risks

The Yin-Yang Five Elements Qi-Yun model, with its holistic view, helps architects:

Break free from the constraints of pure technical metrics
Grasp the principal contradictions and driving forces of system evolution
Extract meaningful patterns from complex signals

Dynamic View

The value of a system lies not in pursuing the extreme of a single performance indicator without limit, but in balancing all elements to achieve long-term coordinated development

The model’s dynamic view reminds us:

Yin-Yang dynamics transform dynamically with environment and stage
The same capability may shift from advantage to risk at different stages
Strategies need timely adjustment as Yun changes

Balance View

The core philosophy of the model is balance rather than extreme:

Not pursuing the limit of a single metric
But pursuing system coordination and sustainability
Finding dynamic balance points within unity of opposites

Practical Application Value

During Architecture Design

Consider the completeness and balance of the Five Elements
Reserve Yin-Yang constraint mechanisms
Design evolution paths that align with Yun trends
Plan channels for Qi flow

During Operations and Governance

Regularly check Five Elements balance
Monitor Qi circulation status
Assess Yin-Yang dynamic changes
Determine Yun phase transitions
Provide early warning of Yang loss-of-control risks

During Decision Review

Analyze root causes from the four-layer model perspective
Check whether basic principles of any layer were violated
Develop systematic solutions
Establish long-term improvement mechanisms

Insights for Architects

In an era of flourishing large models and autonomous agents, infrastructure has become unprecedentedly complex and active.

Cognitive Upgrade

From “managing machines and applications” to “managing intelligence and knowledge”:

Not only focus on application logic itself
But more on how knowledge and intelligence integrate into systems
View models as dynamically evolving components

Mindset Shift

From single-metric optimization to system balance:

Not pursuing the extreme of a single element
But pursuing overall coordination and sustainability
Finding dynamic balance within unity of opposites

Capability Development

From technical expert to systems philosopher:

While mastering technical tools
Cultivate systems thinking and philosophical reflection
Apply holistic frameworks like Yin-Yang and Five Elements

Limitations of the Model

It must be noted that this theory is not a panacea:

Not a Rigid Formula

Its value lies not in providing a rigid formula, but in guiding us to return to reality and think about problems from a more comprehensive perspective

The model provides a thinking framework, not standard answers
Specific applications need to consider actual scenarios
Architects ultimately must make judgments based on specific context

Requires Continuous Validation

Theory needs continuous validation and refinement in practice
Different scenarios may require adjustment and extension
Feedback and improvement in practice are encouraged

Supplement, Not Replace

The model is a tool to assist decision-making
Cannot replace professional judgment and experience
Should be used in combination with other methodologies

Future Outlook

Theory Development

This model has significant room for development:

Quantitative Metrics: Develop more precise quantitative indicators to make the theory more actionable
Tool Support: Develop analysis tools and automated diagnostic systems based on the model
Case Accumulation: Collect more practical cases to validate and enrich the theory
Cross-Domain Application: Explore applications of the model in other complex system domains

Practice Promotion

We hope this framework can help:

CTOs, infrastructure architects, and platform R&D teams
When facing increasingly complex AI infrastructure
Make wiser decisions

Ultimate Vision

Standing with sword in the midst of waves of change, embracing both the Yang of innovation and the Yin of governance, riding the system’s Qi above the currents

Conclusion

AI infrastructure stands at the starting point of a new era. We need not only technological innovation but also conceptual innovation.

The Yin-Yang Five Elements Qi-Yun model offers a unique perspective—combining Eastern philosophical wisdom with modern engineering practice—helping us find simplicity in complexity, stability in change, and unity in opposition.

We hope this model becomes a powerful tool for your thinking about AI infrastructure, helping you find your own “Way” in the balance and evolution of systems.

Dynamic Relationship Modeling: Five Elements Flow Under Yin-Yang Balance

云原生

作者 Jimmy Song

2026年2月10日 21:55

Yin-Yang × Five Elements: Intrinsic Tension of Elements

Each Five Elements component contains both Yin and Yang aspects, manifesting with different polarities in different contexts:

Figure 1: Yin-Yang states of Five Elements. Each element includes Yin (potential, static, introverted) and Yang (explicit, dynamic, extroverted) aspects, with transformation possible between them depending on context.

Yin-Yang Attributes of the Five Elements:

Five Elements	Yin State	Yang State
Water (Data)	Potential data reserves, implicit patterns (static storage of historical data)	Instant data flow, real-time feedback
Wood (Model)	Dormant capabilities (unactivated parameters, backup algorithms)	Explicit expansion (model architecture updates, parameter surge)
Fire (Compute)	Stored energy (idle compute, waiting for scheduling)	High-load operation
Earth (Platform)	Static support (stable operation, non-intervention)	Proactive scheduling and expanded governance
Metal (Hardware)	Implicit constraints (unused capacity)	Explicit limits (resource hard caps maxed out)

Table 1: Dynamic Model Overview

Signs of Yin-Yang Imbalance:

Fire Excessively Yin: GPU compute idle for long periods while tasks backlog → Poor scheduling
Fire Excessively Yang: GPUs at 24-hour full load with no elasticity → Hidden crash risk
Earth Excessively Yang: Too many platform rules → Stifling innovation
Earth Excessively Yin: Lack of platform control → Leading to chaos

Five Elements × Qi: Dynamic Network of Flow

The Five Elements framework provides tools to decompose systems, but system components are not static puzzles—rather, they connect into a dynamic network through the flow of Qi.

Generating Relationships: Qi flows smoothly, forming positive feedback loops
Controlling Relationships: Qi stagnates at certain links or reverse effects strengthen

Dynamic Relationship Principles:

Generating primarily, Controlling secondarily—main energy flows transmit successfully through each link, while balancing forces intervene moderately only to prevent extreme situations.

Yun × Yin-Yang Five Elements: Boundary Conditions for Stage Evolution

The stage-based nature of Yun provides a perspective of boundary conditions evolving over time for the aforementioned Yin-Yang Five Elements dynamics.

Each stage strengthens or weakens certain elements and tensions:

Stage	Main Characteristics	Five Elements Characteristics	Yin-Yang Characteristics
Exploration Stage	High variance, low structure, rapid trial and error	Wood and Fire dominant	Expansion (Yang) outweighs Constraints (Yin)
Platform Stage	Standardization emerges, interfaces and processes converge	Fire generates Earth	Governance (Yin increasing) gradually strengthens
Scale Stage	Efficiency, throughput, cost become main battlegrounds	Earth dominates Wood	Stability (Yin) takes precedence
Rebalancing Stage	Old structures corrected or replaced by new structures	Metal and Water resurge	Transformation (Yang) rises again

Table 2: Typical Interaction Scenarios

Dynamic Stage Transitions:

The Yun layer tells us when to shift focus:

As stages change, the system needs to “allocate interests”
Previously dominant elements may become excessive and need convergence
Previously minor elements need strengthening to address shortcomings

Examples:

In Platform Stage/Scale Stage → Must strengthen governance (Earth’s Yang) and hardware optimization (Metal’s Yang)
To curb the 野蛮 growth tendencies left over from early stages (excessive Wood-Fire Qi)
In Rebalancing Stage → May need to reactivate suppressed innovation potential (Water-Wood Qi)

Comprehensive Analysis Case: GPU Scheduling Scenario

Let’s see how to apply the four-layer model to analyze a real GPU scheduling problem.

Problem Scenario: Cluster experiences task queues under high load

Layer	Diagnosis	Findings
Qi Layer	Observe Qi flow state	Compute Fire Qi is obstructed
Five Elements Layer	Locate elements	Data input too intense (Water Yang excessive) but scheduling (Platform Earth) strategy cannot keep up
Yin-Yang Layer	Analyze tensions	Scheduling strategy blindly pursues maximizing utilization (excessively Yang) while lacking elastic buffers (Yin)
Yun Layer	Assess stage	This is an emerging business that just passed exploration stage and hasn’t perfected scheduling—Platform Stage

Table 3: Four-Layer Diagnostic Analysis

Solutions

Based on four-layer collaborative diagnosis, develop comprehensive solutions:

Qi Layer: Unblock Qi flow
- Expand resources or optimize algorithms
Five Elements Layer: Balance elements
- Strengthen platform scheduling capabilities (Earth)
Yin-Yang Layer: Restore balance
- Introduce elastic buffer mechanisms (supplement Yin)
- Avoid blindly pursuing high utilization
Yun Layer: Follow the trend
- Accelerate introduction of standardized scheduling and resource governance (Earth’s Yun is approaching)

Value of Dynamic Modeling

Through the multi-level dynamic modeling above, we can:

Explain complex scenarios more comprehensively: No longer limited to single perspectives
Locate root causes of problems: Find fundamental causes rather than surface phenomena
Point improvement directions: Obtain systematic solutions
Predict system evolution: Prepare in advance for stage transitions

Practical Recommendations

In daily architecture design and operations, you can establish these thinking habits:

When encountering problems: Analyze layer by layer from a four-layer perspective
When making decisions: Consider impacts on all four layers
When conducting post-mortems: Check whether warning signals from the four-layer model were ignored

The value of a system lies not in pursuing the extreme of a single performance indicator without limit, but in balancing all elements to achieve long-term coordinated development

Engineering Practice Guide: Architecture Decisions Guided by Theory

云原生

作者 Jimmy Song

2026年2月10日 21:55

The theoretical models mentioned above are not 停留在停留在 the conceptual level, but directly provide guidance for the engineering practice of AI infrastructure. In specific scenarios such as GPU scheduling, Agent runtime, and platform governance, we can follow the principles below to apply the Yin-Yang Five Elements Qi Movement model.

Balance Yin and Yang, Avoid Extremes

Consider both propelling forces and restraining forces when making architecture decisions.

GPU Cluster Scaling:

✓ Satisfy business growth (expanding Yang)
✓ Set quota and priority policies (constraining Yin)
✓ Prevent resource abuse

Agent Runtime Design:

✓ Give agents more autonomy (innovation, Yang)
✓ Introduce monitoring and sandboxing mechanisms (governance, Yin)
✓ Prevent loss of control

Practice Checklist:

After every major adjustment, ask yourself: Have I introduced corresponding counter-forces to stabilize the system?

Complete the Five Elements, Identify and Fill Weaknesses

Regularly review whether the five types of elements in the system are balanced.

GPU Infrastructure Check:

Do data pipelines keep up with computing power improvements? (Water and Fire matching)
Does model optimization fully utilize hardware? (Wood and Metal matching)
Can the scheduling platform handle peak loads? (Earth supporting Fire)
Has hardware resources become a bottleneck? (Metal not holding back)

Agent Platform Check:

Is there high-quality knowledge base or real-time data support? (Water)
Is there strong model capability? (Wood)
Is there sufficient computing resources? (Fire)
Is there a good orchestration framework? (Earth)
Is there a reliable environment and interfaces? (Metal)

Practice Strategy:

Once a bottleneck or overload is discovered in a certain link, decisively invest resources to fill the weakness or reduce the burden on the overloaded part

Problem Discovered	Solution
Insufficient data quality (“Water” weak)	Prioritize data governance
Long-term low hardware utilization (Metal strong, Fire weak)	Optimize algorithms or scheduling to better utilize hardware

Table 1: Problem Discovery and Solutions

Follow the Trend, Align with the Movement

Develop reasonable strategies based on the stage of the system.

Strategies for Different Stages:

Stage	Should Do	Should Not Do
Exploration Phase	Rapid trial and error, validate value	Prematurely introduce heavy processes and constraints
Platform Phase	Standardized management, MLOps tools	Remain in disordered exploration
Scale Phase	Strengthen governance and efficiency optimization	Still use the casual practices of the startup period
Rebalancing Phase	Architecture innovation, introduce new technologies	Refuse to move forward

Table 2: Strategies for Different Stages

Regular Assessment: At each quarter or important milestone, assess:

Which stage are we currently in?
What is the main contradiction in this stage?
When might the next stage arrive?
Prepare in advance for the transition

Practice Cases:

An AI training cluster after validating the concept → Should consider entering standardized management (transitioning from exploration phase to platform phase)
When system scale expansion encounters bottlenecks → Consider whether to enter the rebalancing phase and break through through architecture innovation

Observe Qi Field, Optimize Flow

Establish global observability of the system, focusing on trends and correlations rather than single-point metrics.

Monitoring Methods:

Distributed tracing
Metric correlation analysis
Full-link monitoring

Signals of Qi Disorder:

Signal	Possible Cause
Frequent occurrence of various abnormal logs	Global investigation needed
A metric’s periodic fluctuations becoming increasingly intense	The system may be approaching a limit internally

Table 3: Signals of Qi Disorder

Strategies to Keep Qi Flowing Smoothly:

Architecture Level:

Peak clipping and valley filling mechanisms
Message queue backpressure protection

Strategy Level:

Slack capacity
Elastic scaling strategies

Agent System Special Attention:

Monitor task queues and communication latency
Ensure smooth information flow (Qi) between agents
Introduce coordinator agents or reduce concurrency when necessary

Dynamic Adjustment, Continuous Rebalancing

Integrate the Yin-Yang Five Elements Qi Movement model into the team’s continuous improvement process.

Core Questions in Architecture Reviews or Incident Retrospectives:

Is the current main contradiction more inclined toward expansion or constraint, speed or stability?
Is any Five Elements element overloaded (Yang excess) or missing (Yin deficiency)?
Is System Qi congested somewhere?
Do our strategies align with the current stage?

Continuous Improvement Process:

Problem Discovery → Four-Layer Model Diagnosis → Strategy Formulation → Implementation Adjustment → Effect Evaluation → Continuous Optimization

Practice Case: Large-Scale GPU Training Cluster Optimization

Background: A team encountered stability issues while operating a large-scale GPU training cluster.

Four-Layer Model Diagnosis:

Layer	Diagnosis	Findings
Yin-Yang Layer	Speed vs Stability	Continuously compressing fault tolerance and testing time in pursuit of efficiency (speed Yang), leading to frequent online failures (stability Yin damaged)
Five Elements Layer	Five Elements Check	Data pipeline latency gradually increasing (Water weaker than Fire)
Movement Layer	Stage Judgment	System has moved from barbaric growth period to maturity period
Qi Layer	Qi Flow State	Qi stagnation phenomenon obvious

Table 4: Monitoring Methods

Comprehensive Solution:

Yin-Yang Balance:
- Suspend performance optimization
- Invest time to strengthen fault tolerance mechanisms and testing (supplement stability Yin)
Five Elements Completion:
- Add data preprocessing nodes and caching (strengthen Water)
Movement Adjustment:
- Change mindset, shift focus from feature expansion to optimization and governance
Qi Flow Regulation:
- Build full-link tracing system
- Monitor the time of each link from training job submission to completion
- Identify Qi stagnation points and clear them

Result: While maintaining high utilization, the cluster’s stability was greatly improved, and no serious downtime occurred again.

Scenario Application Quick Reference Table

Scenario	Yin-Yang Focus	Five Elements Check	Movement Judgment	Qi Flow Monitoring
GPU Scheduling	Utilization vs Elasticity	Fire - Earth - Metal Balance	Scale Phase Efficiency Optimization	Task queues, resource utilization curves
Agent Runtime	Autonomy vs Governance	Water - Wood - Fire Coordination	Exploration Phase Rapid Iteration	Communication latency, task interaction rhythm
Platform Governance	Innovation Risk Control vs Process Efficiency	Earth - Metal Constraints	Platform Phase Standardization	Rule execution rate, change frequency
Cost Optimization	Performance vs Cost	Fire - Metal Matching	Scale Phase Refinement	Resource waste, idle time

Table 5: Signals of Qi Disorder

Summary

Through the Yin-Yang Five Elements Qi Movement model, we can in practice:

Avoid Extremes: Not blindly pursuing single metrics
Systematic Thinking: Analyzing problems from multiple dimensions
Follow the Trend: Adjust strategies based on stages
Predict Problems: Early warning of risks through Qi field changes
Continuous Improvement: Establish systematic optimization processes

The value of this system lies in: combining Eastern wisdom with engineering practice to provide a unique and effective thinking framework for complex AI infrastructure

AI Learning Resources: 44 Curated Collections from Our Cleanup

云原生

作者 Jimmy Song

2026年2月8日 20:20

“The best way to learn AI is to start building. These resources will guide your journey.”

Figure 1: AI Learning Resources Collection

In my ongoing effort to keep the AI Resources list focused on production-ready tools and frameworks, I’ve removed 44 collection-type projects—courses, tutorials, awesome lists, and cookbooks.

These resources aren’t gone—they’ve been moved here. This post is a curated collection of those educational materials, organized by type and topic. Whether you’re a complete beginner or an experienced practitioner, you’ll find something valuable here.

Why Remove Collections from AI Resources?

My AI Resources list now focuses on concrete tools and frameworks—projects you can directly use in production. Collections, while valuable, serve a different purpose: education and discovery.

By separating them, I:

Keep the resources list actionable and focused
Create a dedicated space for learning materials
Make it easier to find what you need

📚 Awesome Lists (14 Collections)

Awesome lists are community-curated collections of the best resources. They’re perfect for discovering new tools and staying updated.

Must-Explore Awesome Lists

Awesome Generative AI

Models, tools, tutorials, and research papers
Great for: Comprehensive overview of generative AI landscape

Awesome LLM

LLM resources: papers, tools, datasets, applications
Great for: Deep dive into large language models

Awesome AI Apps

Practical LLM applications, RAG examples, agent implementations
Great for: Real-world implementation examples

Awesome Claude Code

Claude Code commands, files, and workflows
Great for: Maximizing Claude Code productivity

Awesome MCP Servers

MCP servers for modular AI backend systems
Great for: Building with Model Context Protocol

Specialized Awesome Lists

Awesome ChatGPT Prompts - Prompt examples for various scenarios
Awesome LLM Apps - LLM applications with code examples
Awesome Multimodal LLM - Multimodal model resources
Awesome MCP Clients - MCP client tools and SDKs
Awesome Claude Skills - Claude Skills and workflows
Awesome GitHub Copilot - Copilot customizations
Awesome Nano Banana Pro - Image model prompts and examples
Awesome SaaS - AI platform templates
Awesome Claude Code Subagents - Claude Code subagents

🎓 Courses & Tutorials (9 Curricula)

Structured learning paths from universities and tech companies.

Microsoft’s AI Curriculum

AI for Beginners

12 weeks, 24 lessons covering neural networks, deep learning, CV, NLP
Great for: Complete AI foundation
Format: Lessons, quizzes, projects

Machine Learning for Beginners

12-week, 26-lesson curriculum on classic ML
Great for: ML fundamentals without deep math
Format: Project-based exercises

Generative AI for Beginners

18 lessons on building GenAI applications
Great for: Practical GenAI development
Format: Hands-on projects

AI Agents for Beginners

11 lessons on agent systems
Great for: Understanding autonomous agents
Format: Project-driven learning

EdgeAI for Beginners

Optimization, deployment, and real-world Edge AI
Great for: On-device AI applications
Format: Practical tutorials

MCP for Beginners

Model Context Protocol curriculum
Great for: Building with MCP
Format: Cross-language examples and labs

Official Platform Courses

Hugging Face Learn Center

Free courses on LLMs, deep RL, CV, audio
Great for: Hands-on Hugging Face ecosystem
Format: Interactive notebooks

OpenAI Cookbook

Runnable examples using OpenAI API
Great for: OpenAI API best practices
Format: Code examples and guides

PyTorch Tutorials

Basics to advanced deep learning
Great for: PyTorch mastery
Format: Comprehensive tutorials

🍳 Cookbooks & Example Collections (5 Collections)

Practical code examples and recipes.

Claude Cookbooks

Notebooks and examples for building with Claude
Great for: Anthropic Claude integration
Format: Jupyter notebooks

Hugging Face Cookbook

Practical AI cookbook with Jupyter notebooks
Great for: Open models and tools
Format: Hands-on examples

Tinker Cookbook

Training and fine-tuning examples
Great for: Fine-tuning workflows
Format: Platform-specific recipes

E2B Cookbook

Examples for building LLM apps
Great for: LLM application development
Format: Recipes and tutorials

arXiv Paper Curator

6-week course on RAG systems
Great for: Production-ready RAG
Format: Project-based learning

📖 Guides & Handbooks (5 Resources)

In-depth guides on specific topics.

Prompt Engineering Guide

Comprehensive prompt engineering resources
Great for: Mastering prompt design
Format: Guides, papers, lectures, notebooks

Evaluation Guidebook

LLM evaluation best practices from Hugging Face
Great for: Assessing LLM performance
Format: Practical guide

Context Engineering

Design and optimize context beyond prompt engineering
Great for: Advanced context management
Format: Practical handbook

Context Engineering Intro

Template and guide for context engineering
Great for: Providing project context to AI assistants
Format: Template + guide

Vibe-Coding Workflow

5-step prompt template for building MVPs with LLMs
Great for: Rapid prototyping with AI
Format: Workflow template

🗂️ Template & Workflow Collections

Reusable templates and workflows.

Claude Code Templates

Code templates for various programming scenarios
Great for: Claude AI development
Format: Template collection

n8n Workflows

2,000+ professionally organized n8n workflows
Great for: Workflow automation
Format: Searchable catalog

N8N Workflows Catalog

Community-driven reusable workflow templates
Great for: Workflow import and versioning
Format: Template catalog

📊 Research & Evaluation

Academic and evaluation resources.

LLMSys PaperList

Curated list of LLM systems papers
Great for: Research on training, inference, serving
Format: Paper collection

Free LLM API Resources

LLM providers with free/trial API access
Great for: Experimentation without cost
Format: Provider list

🎨 Other Notable Resources

System Prompts and Models of AI Tools

Community-curated collection of system prompts and AI tool examples
Great for: Prompt and agent engineering
Format: Resource collection

ML Course CS-433

EPFL Machine Learning Course
Great for: Academic ML foundation
Format: Lectures, labs, projects

Machine Learning Engineering

ML engineering open-book: compute, storage, networking
Great for: Production ML systems
Format: Comprehensive guide

Realtime Phone Agents Course

Build low-latency voice agents
Great for: Voice AI applications
Format: Hands-on course

LLMs from Scratch

Build a working LLM from first principles
Great for: Understanding LLM internals
Format: Repository + book materials

💡 How to Use This Collection

For Complete Beginners

Start with: Microsoft’s AI for Beginners
Practice with: PyTorch Tutorials
Explore: Awesome AI Apps for inspiration

For Developers

Build skills: OpenAI Cookbook + Claude Cookbooks
Find tools: Awesome Generative AI + Awesome LLM
Learn workflows: n8n Workflows Catalog

For Researchers

Stay updated: Awesome Generative AI + LLMSys PaperList
Deep dive: Awesome LLM
Implement: Hugging Face Cookbook

For Product Builders

Find examples: Awesome AI Apps
Learn workflows: n8n Workflows Catalog
Study patterns: Awesome LLM Apps

🔄 What Was NOT Removed

Agent frameworks and production tools remain in the AI Resources list, including:

AutoGen - Microsoft’s multi-agent framework
CrewAI - High-performance multi-agent orchestration
LangGraph - Stateful multi-agent applications
Flowise - Visual agent platform
Langflow - Visual workflow builder
And 80+ more agent frameworks

These are functional tools you can use to build applications, not educational collections. They belong in the AI Resources list.

📝 Summary

I removed 44 collection-type projects from the AI Resources list to keep it focused on production tools:

14 Awesome Lists - Discover new tools and stay updated
9 Courses & Tutorials - Structured learning paths
5 Cookbooks - Practical code examples
5 Guides & Handbooks - In-depth resources
4 Template Collections - Reusable workflows
7 Other Resources - Research and evaluation

These resources remain incredibly valuable for learning and discovery. They just serve a different purpose than the production-focused tools in my AI Resources list.

Next Steps:

Bookmark this post for future reference
Explore the AI Resources list for production tools (agent frameworks, databases, etc.)
Check out my blog for more AI engineering insights

Acknowledgments: This collection was compiled during my AI Resources cleanup initiative. Special thanks to all the maintainers of these awesome lists, courses, and collections for their invaluable contributions to the AI community.

Standing on Giants' Shoulders: The Traditional Infrastructure Powering Modern AI

云原生

作者 Jimmy Song

2026年2月8日 16:00

“If I have seen further, it is by standing on the shoulders of giants.” — Isaac Newton

Figure 1: Standing on Giants’ Shoulders: The Traditional Infrastructure Powering Modern AI

In the excitement surrounding LLMs, vector databases, and AI agents, it’s easy to forget that modern AI didn’t emerge from a vacuum. Today’s AI revolution stands upon decades of infrastructure work—distributed systems, data pipelines, search engines, and orchestration platforms that were built long before “AI Native” became a buzzword.

This post is a tribute to those traditional open source projects that became the invisible foundation of AI infrastructure. They’re not “AI projects” per se, but without them, the AI revolution as we know it wouldn’t exist.

The Evolution: From Big Data to AI

Era	Focus	Core Technologies	AI Connection
2000s	Web Search & Indexing	Lucene, Elasticsearch	Semantic search foundations
2010s	Big Data & Distributed Computing	Hadoop, Spark, Kafka	Data processing at scale
2010s	Cloud Native	Docker, Kubernetes	Model deployment platforms
2010s	Stream Processing	Flink, Storm, Pulsar	Real-time ML inference
2020s	AI Native	Transformers, Vector DBs	Built on everything above

Table 1: Evolution of Data Infrastructure

Big Data Frameworks: The Data Engines

Before we could train models on petabytes of data, we needed ways to store, process, and move that data.

Apache Hadoop (2006)

GitHub: https://github.com/apache/hadoop

Hadoop democratized big data by making distributed computing accessible. Its HDFS filesystem and MapReduce paradigm proved that commodity hardware could process web-scale datasets.

Why it matters for AI:

Modern ML training datasets live in HDFS-compatible storage
Data lakes built on Hadoop became training data reservoirs
Proved that distributed computing could scale horizontally

Apache Kafka (2011)

GitHub: https://github.com/apache/kafka

Kafka redefined data streaming with its log-based architecture. It became the nervous system for real-time data flows in enterprises worldwide.

Why it matters for AI:

Real-time feature pipelines for ML models
Event-driven architectures for AI agent systems
Streaming inference pipelines
Model telemetry and monitoring backbones

Apache Spark (2014)

GitHub: https://github.com/apache/spark

Spark brought in-memory computing to big data, making iterative algorithms (like ML training) practical at scale.

Why it matters for AI:

MLlib made ML accessible to data engineers
Distributed data processing for model training
Spark ML became the de facto standard for big data ML
Proved that in-memory computing could accelerate ML workloads

Search Engines: The Retrieval Foundation

Before RAG (Retrieval-Augmented Generation) became a buzzword, search engines were solving retrieval at scale.

Elasticsearch (2010)

GitHub: https://github.com/elastic/elasticsearch

Elasticsearch made full-text search accessible and scalable. Its distributed architecture and RESTful API became the standard for search.

Why it matters for AI:

pioneered distributed inverted index structures
Proved that horizontal scaling was possible for search workloads
Many “AI search” systems actually use Elasticsearch under the hood
Query DSL influenced modern vector database query languages

OpenSearch (2021)

GitHub: https://github.com/opensearch-project/opensearch

When AWS forked Elasticsearch, it ensured search infrastructure remained truly open. OpenSearch continues the mission of accessible, scalable search.

Why it matters for AI:

Maintains open source innovation in search
Vector search capabilities added in 2023
Demonstrates community fork resilience

Databases: From SQL to Vectors

The evolution from relational databases to vector databases represents a paradigm shift—but both have AI relevance.

Traditional Databases That Paved the Way

Dgraph (2015) - Graph database proving that specialized data structures enable new use cases
TDengine (2019) - Time-series database for IoT ML workloads
OceanBase (2021) - Distributed database showing ACID transactions could scale

Why they matter for AI:

Proved that specialized database engines could outperform general-purpose ones
Database internals (indexing, sharding, replication) are now applied to vector databases
Multi-model databases (graph + vector + relational) are becoming the norm for AI apps

Cloud Native: The Runtime Foundation

When Docker and Kubernetes emerged, they weren’t built for AI—but AI couldn’t scale without them.

Docker (2013) & Kubernetes (2014)

GitHub: https://github.com/kubernetes/kubernetes

Kubernetes became the operating system for cloud-native applications. Its declarative API and controller pattern made it perfect for AI workloads.

Why it matters for AI:

Model deployment platforms (KServe, Seldon Core) run on K8s
GPU orchestration (NVIDIA GPU Operator, Volcano, HAMi) extends K8s
Kubeflow made K8s the standard for ML pipelines
Microservice patterns enable modular AI agent architectures

Service Mesh & Serverless

Istio (2016), Knative (2018) - Service mesh and serverless platforms that proved:

Network-level observability applies to AI model calls
Scale-to-zero is essential for cost-effective inference
Traffic splitting enables A/B testing of ML models

Why they matter for AI:

AI Gateway patterns evolved from API gateways + service mesh
Serverless inference platforms use Knative-style autoscaling
Observability patterns (tracing, metrics) are now standard for ML systems

API Gateways: From REST to LLM

API gateways weren’t designed for AI, but they became the foundation of AI Gateway patterns.

Kong, APISIX, KGateway

These API gateways solved rate limiting, auth, and routing at scale. When LLMs emerged, the same patterns applied:

AI Gateway Evolution:

Traditional API Gateway (2010s)
 ↓
Rate Limiting → Token Bucket Rate Limiting
Auth → API Key + Organization Management
Routing → Model Routing (GPT-4 → Claude → Local Models)
Observability → LLM-specific Telemetry (token usage, cost)
 ↓
AI Gateway (2024)

Why they matter for AI:

Proved that centralized API management scales
Plugin architectures enable LLM-specific features
Traffic management patterns apply to prompt routing
Security patterns (mTLS, JWT) now protect AI endpoints

Workflow Orchestration: The Pipeline Backbone

Data engineering needs pipelines. ML engineering needs pipelines. AI agents need workflows.

Apache Airflow (2015)

GitHub: https://github.com/apache/airflow

Airflow made pipeline orchestration accessible with its DAG-based approach. It became the standard for ETL and data engineering.

Why it matters for AI:

ML pipeline orchestration (feature engineering, training, evaluation)
Proved that DAG-based workflow definition works at scale
Prompt engineering pipelines use Airflow-style orchestration
Scheduler patterns are now applied to AI agent workflows

n8n, Prefect, Flyte

Modern workflow platforms that evolved from Airflow’s foundations:

n8n (2019) - Visual workflow automation with AI capabilities
Prefect (2018) - Python-native workflow orchestration for ML
Flyte (2019) - Kubernetes-native workflow orchestration for ML/data

Why they matter for AI:

Multi-modal agents need workflow orchestration
RAG pipelines are essentially ETL pipelines for embeddings
Prompt chaining is DAG-based orchestration

Data Formats: The Lakehouse Foundation

Before we could train on massive datasets, we needed formats that supported ACID transactions and schema evolution.

Delta Lake, Apache Iceberg, Apache Hudi

These table formats brought reliability to data lakes:

Why they matter for AI:

Training datasets need versioning and reproducibility
Feature stores use Delta/Iceberg as storage formats
Proved that “big data” could have transactional semantics
Schema evolution handles ML feature drift

The Invisible Thread: Why These Projects Matter

What do all these projects have in common?

They solved scaling first - AI training/inference needs horizontal scaling
They proved distributed systems work - Modern AI is fundamentally distributed
They created ecosystem patterns - Plugin systems, extension points, APIs
They established best practices - Observability, security, CI/CD
They built developer habits - YAML configs, declarative APIs, CLI tools

The AI Native Continuum

Modern “AI Native” infrastructure didn’t replace these projects—it builds on them:

Traditional Project	AI Native Evolution	Example
Hadoop HDFS	Distributed model storage	HDFS for datasets, S3 for checkpoints
Kafka	Real-time feature pipelines	Kafka → Feature Store → Model Serving
Spark ML	Distributed ML training	MLlib → PyTorch Distributed
Elasticsearch	Vector search	ES → Weaviate/Qdrant/Milvus
Kubernetes	ML orchestration	K8s → Kubeflow/KServe
Istio	AI Gateway service mesh	Istio → LLM Gateway with mTLS
Airflow	ML pipeline orchestration	Airflow → Prefect/Flyte for ML

Table 2: From Traditional to AI Native

Why We’re Removing Them from AI Resources List

This post honors these projects, but we’re also removing them from our AI Resources list. Here’s why:

They’re not “AI Projects”—they’re foundational infrastructure.

Hadoop, Kafka, Spark are data engineering tools, not ML frameworks
Elasticsearch is search, not semantic search
Kubernetes is general-purpose orchestration
API gateways serve REST/GraphQL, not just LLMs

But their absence doesn’t diminish their importance.

By removing them, we acknowledge that:

AI has its own ecosystem - Transformers, vector DBs, LLM ops
Traditional infra has its own domain - Data engineering, cloud native
The intersection is where innovation happens - AI-native data platforms, LLM ops on K8s

The Giants We Stand On

The next time you:

Deploy a model on Kubernetes
Stream features through Kafka
Search embeddings with a vector database
Orchestrate a RAG pipeline with Prefect

Remember: You’re standing on the shoulders of Hadoop, Kafka, Elasticsearch, Kubernetes, and countless others. They built the roads we now drive on.

The Future: Building New Giants

Just as Hadoop and Kafka enabled modern AI, today’s AI infrastructure will become tomorrow’s foundation:

Vector databases may become the new standard for all search
LLM observability may evolve into general distributed tracing
AI agent orchestration may reinvent workflow automation
GPU scheduling may influence general-purpose resource management

The cycle continues. The giants of today will be the foundations of tomorrow.

Conclusion: Gratitude and Continuity

As we clean up our AI Resources list to focus on AI-native projects, we don’t forget where we came from. Traditional big data and cloud native infrastructure made the AI revolution possible.

To the Hadoop committers, Kafka maintainers, Kubernetes contributors, and all who built the foundation: Thank you.

Your work enabled ChatGPT, enabled Transformers, enabled everything we now call “AI.”

Standing on your shoulders, we see further.

Acknowledgments: This post was inspired by the need to refactor our AI Resources list. The 27 projects mentioned here are being removed—not because they’re unimportant, but because they deserve their own category: The Foundation.

Helm v4: Paradigm Convergence and Plugin System Rebuild

云原生

作者 Jimmy Song

2025年11月14日 19:18

The release of Helm 4 is not just a technical upgrade, but a deep convergence of cloud-native delivery paradigms. The rebuilt plugin system and supply chain governance capabilities make Helm once again a driving force in the Kubernetes ecosystem.

Since its first release in 2016, Helm has been one of the most important application distribution tools in the Kubernetes ecosystem. Helm v4 is not a “minor enhancement,” but a comprehensive update around delivery methods, extension mechanisms, and supply chain approaches.

This article reconstructs Helm’s historical context and focuses on why Helm 4 represents a paradigm-converging release.

Helm: From Tiller to Declarative Delivery

Below is a textual timeline showing key milestones from Helm v2 to v4, helping you understand its technical evolution:

2016: Helm v2 released, using the Tiller architecture.
2017: Chart Hub expands, major projects begin providing official Charts.
2018: Security model controversies intensify, Tiller’s permission issues become apparent.
2019: Helm v3 released, Tiller removed, OCI support introduced.
2021: GitOps becomes widespread, Server-Side Apply (SSA) becomes the mainstream delivery semantic.
2023: kstatus widely adopted for controller status assessment and health calculation.
2025: Helm v4 released, bringing SSA, WASM plugins, reproducible builds, and content hash caching.

Each major Helm release closely follows Kubernetes paradigms, driving progress in declarative delivery and ecosystem tooling.

Fundamental Changes in Helm v4

This section analyzes the core technical upgrades and paradigm shifts in Helm v4.

Delivery Paradigm Update: Default Server-Side Apply (SSA, Server-Side Apply)

In Helm v3 and earlier, Helm used a “three-way merge” model for resource delivery. Helm v4 fully switches to Server-Side Apply (SSA, Server-Side Apply), meaning the API Server determines field ownership.

This shift brings several direct results:

Full semantic alignment with kubectl apply and GitOps controllers (such as Argo, Flux)
When multiple controllers manage the same object, silent overrides are avoided and conflicts are explainable
Helm’s behavior now follows Kubernetes’ officially recommended declarative delivery paradigm

The following flowchart compares the delivery semantics of Helm v3 and v4.

Figure 1: Helm v3/v4 Delivery Semantics Comparison

Helm is now aligned with the delivery semantics of modern Kubernetes versions, improving predictability and safety in resource management.

kstatus-Driven Wait Behavior and Readiness Annotations

In Helm 3, --wait could only make fuzzy status judgments on limited resources, lacking extensibility and explainability.

Helm 4 introduces kstatus (Kubernetes Status) as the basis for health status parsing, and supports two key annotations:

helm.sh/readiness-success
helm.sh/readiness-failure

Chart authors can precisely define conditions for installation success or failure. Helm’s waiting model now offers “explainability + extensibility,” upgrading from a “templating tool” to a true “deployment orchestrator.”

Extension System Rebuild: WASM Plugin System

Helm 4 thoroughly reconstructs the plugin model, mainly including:

Typed and Structured Plugins

Arbitrary scripts are no longer allowed; plugins must follow typed and structured standards

WebAssembly Plugin Runtime (Extism)

More secure (sandbox isolation)
Cross-language support
Easy unified management in CI/CD and enterprise platforms
Predictable and testable

Post-renderer Integrated into Plugin System

Moves beyond the “external executable black box” era
Helm becomes a programmable platform, not just a template renderer

Engineering Capabilities Upgrade: Reproducible Builds, Content Hash Caching, chart API v3

Helm v4 brings the following engineering improvements:

Chart packaging is reproducible (supports signing, SBOM, SLSA, etc. for supply chain governance)
Local cache uses content hashes, avoiding version-based conflicts
chart API v3 (experimental) is stricter and more flexible
SDK logging system upgraded to Go slog (modern logging)

These capabilities enable Helm charts to enter serious software supply chain governance.

Feature Comparison (Helm v3 → v4)

The table below compares core features between Helm v3 and v4 for a quick understanding of the upgrade value.

Area	Helm 3	Helm 4
Apply Model	Three-way merge	Default SSA
Wait Behavior	Fuzzy, not extensible	kstatus + annotation
Plugin System	Script, uncontrollable	WASM, typed plugins
Post-renderer	External executable	Plugin subsystem
Build	Not reproducible	Reproducible build
Cache	name/version	Content hash
Chart API	v2	v2 + v3 (experimental)
SDK Logs	stdlib log	slog

Table 1: Helm v3 vs v4 Feature Comparison

This is a release that “repays technical debt in bulk + aligns with contemporary Kubernetes semantics.”

Why Is Helm v4 a Paradigm Convergence Event?

The release of Helm v4 is not just a feature upgrade, but a deep convergence of delivery paradigms, mainly in three aspects:

Kubernetes Delivery Semantics Unified to SSA

Previously: kubectl, GitOps, and Helm each had their own logic. Now: All unified to SSA, consistent delivery behavior, smoother ecosystem collaboration.

Plugin System Enters the Platform Era

WASM (WebAssembly) brings a secure, universal, and controllable plugin runtime. Infrastructure projects widely adopt WASM: Envoy → WASM Filters, Kubernetes → WASM CRI/OCI, and now Helm joins the platform camp.

Charts Enter Supply Chain Governance

Reproducible builds and digest verification allow Helm charts to be managed as seriously as container images, greatly enhancing supply chain security.

The entire ecosystem moves to a unified capability baseline, driving cloud-native delivery standardization.

My Helm History and Observations

As an early user from the Helm v2 era, I have experienced the following stages:

Tiller security controversies
v3 migration (state stored in secrets)
Large-scale chart consolidation in the community
OCI adoption
Today’s SSA / WASM / reproducible build

Each major Helm version upgrade is not about chasing trends, but proactively aligning with Kubernetes paradigms:

v3 aligns with K8s “no cluster-side runtime” principle
v4 aligns with SSA, kstatus, WASM, OCI, and other advances from the past five years

Helm exemplifies the evolution rhythm of infrastructure projects: not by piling on features, but by evolving in semantic alignment with the platform.

Summary

The release of Helm v4 marks a new paradigm for Kubernetes application delivery. SSA, WASM plugins, kstatus, and reproducible builds make Helm not just a templating tool, but a core for supply chain governance and platform extensibility. For cloud-native developers and platform teams, Helm v4 is a paradigm upgrade worth attention.

References

Kimi K2 Thinking: The True Awakening of China's Thinking Model

云原生

作者 Jimmy Song

2025年11月14日 16:25

China’s large language models have finally moved from “writing like humans” to “thinking like humans.” The open-sourcing of Kimi K2 is a watershed moment for China’s AI trajectory.

The narrative around China’s large language models is shifting from “Chat-style models” to “Thinking models (Thinking Model, Thinking Model).”

Moonshot AI’s open-sourcing of Kimi K2 Thinking marks the first real landing of this transition. K2 is not just another iteration like ChatGLM or Qwen; it’s the first time a Chinese team has unified “deep reasoning + long context + tool invocation continuity” in training. This is the core of the thinking model approach and the reason why models like Claude and Gemini have led the field.

The Significance of K2’s Open Source: China Enters the Era of Thinking Models

Why is K2’s open source a turning point? Because it enables Chinese models to achieve the following capabilities for the first time:

Stable execution of 200–300 tool invocations (toolchain reasoning stability)
Deep, multi-stage reasoning chain execution (CoT Consistency, Chain-of-Thought Consistency)
256k context as a “working memory” (Working Memory, Working Memory)
Native INT4 acceleration + MoE activation sparsity scheduling

This is a completely different path from “stacking parameters → stacking benchmarks,” emphasizing reasoning ability over parameter scale.

In short:

K2 is the first time a Chinese model has entered the sequence of thinking models (Thinking Model, Thinking Model).

Dissecting K2’s Technical Approach

K2’s technical approach can be broken down into five key points, each directly impacting the model’s reasoning ability and ecosystem adaptability.

MoE Expert Division: Cognitive Division Rather Than Parameter Expansion

K2’s MoE (Mixture of Experts, Mixture of Experts) design philosophy is distinct from previous models. The core is not about activating fewer parameters or running larger models more cheaply, but about assigning different cognitive sub-skills to different experts. For example:

Mathematical reasoning expert
Planning expert
Tool invocation expert
Browser task expert
Code generation expert
Long-chain retention expert

This division aligns directly with Claude 3.5’s cognitive layering (Cognitive Layering, Cognitive Layering) approach. K2’s MoE is about “dividing thinking among the model,” not just “making computation cheaper.”

256K Context: Building the Model’s Working Memory

K2’s ultra-long context is not just a parameter showcase; it’s designed to build the model’s “thinking buffer.” It allows the entire process to retain reasoning chains, tool invocation states, multi-stage reflection, and uninterrupted long tasks (such as research or code refactoring), stably executing multi-stage agent workflows. Long-term thinking requires long-term memory support, and K2’s long context is the “memory” for sustained reasoning chains.

Intertwined Training of Tool Invocation and Reasoning Chains

K2 excels in the intertwined training of tool invocation and reasoning chains. Traditional open-source models typically follow this process:

Generate reasoning
Output JSON function call
Tool returns result
Continue reasoning

In this approach, the reasoning chain and invocation chain are separated. K2’s training allows the reasoning chain to invoke tools at any time and feed tool results back into the reasoning chain for the next stage of thinking. It supports 200–300 consecutive tool invocations without interruption, fully aligning with Claude 3.5’s Interleaved CoT + Tool Use.

Native INT4 Quantization: Ensuring Reasoning Chain Stability

K2’s INT4 (INT4, 4-bit Integer Quantization) approach is not ordinary post-quantization. Its purpose is not only to reduce memory usage and increase throughput, but more importantly, to ensure that deep reasoning chains do not break due to insufficient computing power. The biggest killer of deep thinking chains is timeout, freezing, or unstable workers. INT4 enables Chinese GPUs (non-H100) to run complete reasoning chains, which is highly significant for China’s ecosystem.

MoE + Long Context + Toolchain: Unified Training Rather Than Module Stitching

K2’s most important feature is its holistic training approach: expert division, long context-driven consistency, tool invocation trained through real execution, browser tasks and long-step task reinforcement, and INT4 entering the training loop. It’s not a “ChatLLM + Memory + RAG + Tools” patchwork, but an integrated reasoning system.

Alignment and Differences Between K2 and International Mainstream Approaches

K2 is highly aligned with international mainstream models (such as Claude, Gemini, OpenAI) in cognitive reasoning, ultra-long context, and tool invocation mechanisms, but also has unique advantages for Chinese models:

Native INT4 + adaptation to Chinese computing power is rare globally
Toolchain continuity is more stable than most open-source models
Higher degree of open source, stronger ecosystem reusability

Collaborative Value of China’s AI Infra: K2 × RLinf × Mem-alpha

A series of important open-source infrastructures have emerged in the K2 ecosystem. The table below summarizes these project types and their value to K2:

Here is a comparison table of the collaborative value of each infrastructure with K2:

Project	Type	Value to K2
RLinf	Reinforcement Learning	Used to train stronger planning/browser task capabilities
Mem-alpha	Memory Enhancement	Can be combined with K2 to form long-term memory agents
AgentDebug	Agent Error Debugging	Used to analyze K2’s toolchain errors
UI-Genie	GUI Agent Training	Can serve as an experimental field for K2’s agent capability expansion

Table 1: Collaborative Value of China’s AI Infra Ecosystem

This combination is already forming a China AI Agent Infra Stack.

Personal View: The Significance of K2’s Approach

I believe the significance of K2 lies not in the model itself, but in its technical approach:

K2 marks the first time Chinese models have shifted from “language generation competition” to “thinking ability competition.”

For the past three years, the main line of China’s open-source models has been evaluation scores, parameter scale, instruction following, and alignment data. But K2 is the first to clearly take the path of deep reasoning, tool intertwining, cognitive division, long-term task chains, and native performance optimization. This means China’s model trajectory is now synchronized with the US, rather than chasing old paths.

Key Directions to Watch in K2’s Ecosystem Over the Next Year

K2’s future ecosystem influence will depend on several key points:

Whether it opens the tool registry (Tool Registry, Tool Registry)
Whether it supports dynamic memory (Mem-alpha integration)
Whether it opens the MoE expert structure
Whether it can form a Chinese reasoning chain optimization path with vLLM / llm-d / KServe
Whether it supports fault tolerance for multi-node continuous reasoning chains

These capabilities will determine K2’s ecosystem influence and technical extensibility.

K2 Thinking Model Architecture Diagram

The following flowchart illustrates the core architecture of the K2 thinking model and its collaboration with external agents/applications:

Figure 1: K2 Thinking Model Architecture

Summary

K2 is the first time China’s model trajectory is heading in the right direction:

From “writing like humans” to “thinking like humans.”

The era of thinking models is coming, and Chinese models are finally standing on the same roadmap as the international forefront.

References

TRAE SOLO vs VS Code: Rethinking Coding Tools from the Perspective of AI Engineering Entities

云原生

作者 Jimmy Song

2025年11月14日 15:14

Coding tools are evolving from “AI assistants” into true engineering entities. How can we reinterpret the roles of TRAE SOLO and VS Code from a pipeline perspective?

Recently, TRAE International Edition SOLO mode has been fully opened to overseas users. It claims to be a “responsive coding agent” and is now available for official trial, with token-based rate limiting.

I’ve used early versions of TRAE (without SOLO access), and also tried Qoder and Kiro. The AI coding field is flourishing, each tool with its own strengths.

Now, with GitHub’s Agent HQ concept from Universe, and the “AI Engineering Entity (AIEE)” framework I wrote about in AI-Native Application Architecture, it’s time to re-examine today’s coding tool landscape.

This article compares TRAE SOLO and VS Code (with Copilot, Plan/Agent mode, and Agent HQ) from the perspective of AI engineering entities, combining personal experience to outline their differences in engineering automation, collaboration, and governance.

Three Engineering Role Abstractions: End-to-End Executor, Contextual Collaborator, and Expert Orchestrator

From an engineering perspective, current mainstream AI coding tools can be abstracted into three roles:

End-to-End Executor:

Focuses on “requirement to deployment” workflows, capable of autonomous planning, task breakdown, coding, testing, previewing, and even deployment. Officially called “AI-Powered Context Engineer.”
User experience: like a “full-chain executor”—give it a requirement, and it handles the project, even if it’s slow or imperfect.

Contextual Collaborator:

VS Code is a powerful editor. Copilot has evolved from line-level completion to Chat, Plan agent, Agent mode, supporting multi-step tasks and codebase analysis.
It doesn’t take over the whole project, but efficiently handles local tasks under your guidance, acting as an automated unit for specific segments.

Expert Orchestrator / Specialist Engine:

GitHub’s Agent HQ is a “central platform for AI coding agents,” a unified control plane that can connect to OpenAI, Anthropic, Google, xAI, etc., run agents in parallel, and compare results.
Functions as an “expert orchestrator” for key steps—planning, review, refactoring, or decision-making—providing high-quality output without taking over the entire project.

These three roles correspond to the structure in “AI Engineering Entity (AIEE)”:

Single end-to-end executor (TRAE SOLO)
Contextual collaborator residing in the IDE (VS Code + Copilot)
Specialist orchestrator platform for multi-entity scheduling (Agent HQ)

Quick Product Status Check

To avoid memory bias, let’s clarify some key facts.

TRAE / TRAE SOLO

TRAE claims to be a “10x AI Engineer,” able to independently understand requirements and execute development tasks.
SOLO mode is GA for international users, emphasizing full-chain automation, available directly but with token limits.
Underlying open-source Trae Agent CLI can execute multi-step engineering tasks in real codebases.
TraeIDE’s official page shows built-in Claude 3.5/3.7, DeepSeek, etc., but the community notes slow integration of new models like Claude Sonnet 4.5.

So, “TRAE does not support Claude” is now inaccurate—at least officially, Claude models are included. Which model is used in SOLO mode and whether it’s exposed to users remains unclear; the experience still needs improvement.

VS Code + Copilot + Agent HQ

Copilot in VS Code now features Chat, Plan agent, Todo/multi-step execution:
- Plan mode analyzes codebases, generates execution plans, splits into Todos, then implementation agents execute step by step.
- Agent mode provides a more automated “multi-step companion programmer” experience.
GitHub launched Agent HQ at Universe 2025, integrating Copilot and third-party agents (Anthropic, OpenAI, Google, xAI, Cognition, etc.) into a unified control plane, supporting parallel runs and result comparison.

In short:

TRAE is like “embedding an engineering entity into the IDE.”
VS Code + Copilot is “adding a set of engineering entities to a mature IDE.”
Agent HQ is positioned as “headquarters for multiple engineering entities.”

Reconstructing Comparison Dimensions with the AI Engineering Entity Framework

In “AI Engineering Entity (AIEE),” the definition is:

AI has evolved from editor auto-completion to a formal node in the software supply chain, able to receive tasks, produce reviewable artifacts (PR/diff/report), pass tests/gates, and be replaced if it fails. It’s no longer just an “enhanced human developer,” but a “functional engineering unit” in the pipeline.

Based on this, we can reconstruct key comparison dimensions for TRAE and VS Code:

Existence as an Independent Functional Unit
Can it autonomously plan, implement, and produce PRs/reports from natural language requirements, without continuous human intervention?
Context Modeling Capability
Can it model across files, directories, terminal output, and browser content to form a stable engineering context?
Position in the Pipeline
Is it an enhancement layer within the IDE, or a formal node in CI/CD and code review flows?
Reviewability and Replaceability
Are its outputs standardized (PR, diff, report) and suitable for regular pipeline review and rollback?
Multi-Agent Collaboration Capability
Does it natively support multi-agent collaboration, or is it focused on single-agent enhancement?

TRAE SOLO vs VS Code: Engineering Entity Comparison Table

The following table summarizes the main differences from the engineering entity perspective. Note: VS Code includes Copilot Chat + Plan/Agent mode by default and can mount the Agent HQ ecosystem.

You can interpret this table as: “If AI is treated as an engineering entity in the pipeline, what roles do TRAE and VS Code play?”

Table:

Dimension	TRAE SOLO	VS Code + Copilot / Agent
Engineering Role	Single strong entity, directly handles end-to-end tasks from idea to deployment	IDE + multiple entities (Plan, Implementation, Review), IDE itself is the engineering base
Task Granularity	Project/feature level: from PRD-style description to full project scaffold, implementation, testing, preview	Function/file level mainly; Plan mode can scale to feature/subsystem level
Context Modeling	Emphasizes “context engineering”: reads codebase, terminal output, browser content as unified input for SOLO	Mainly codebase; Plan Agent generates plans based on code analysis, Agent mode schedules by plan
Automation	Can proactively modify files, run commands, tests, start local services, forming a complete loop	Plan/Agent can run commands, modify files, run tests, but is more dependent on your current project/workflow
Human Intervention	More “post-review”: let it run first, then review and fine-tune	More “in-process collaboration”: frequent intervention in planning, implementation, and review, with control points at each step
Output Form	Code changes, test results, preview; sometimes PRs/docs	Code completion, refactoring, PR comments, CodeQL reports, Plan/Todo lists
Multi-Agent	Core is the SOLO agent; other capabilities (like Trae Agent CLI) are extensions	Copilot itself is an agent; Agent HQ allows parallel competition among multiple agents
Model Transparency	Product exposes specific models poorly; users can’t tell which model is used	GitHub clearly marks Copilot’s model family; Agent HQ shows agent sources directly
Performance	Strong automation but slow; complex projects may stall at “thinking” stage; hard token limits	Stable response in familiar projects; mostly local changes, overall latency is controllable
Privacy & Compliance	Official and third-party reviews mention extensive telemetry/data collection; enterprise adoption needs extra evaluation	Copilot for Enterprise has clear data isolation/compliance, suitable for most enterprise governance needs

Table 1: TRAE SOLO vs VS Code Engineering Entity Comparison

From the table:

If you want “an AI engineering entity that takes full responsibility from requirement to deployment,” TRAE SOLO fits that role.
If you want “a stable engineering base + a set of pluggable entities,” VS Code + Copilot + Agent HQ fits better.

Workflow Comparison: Two Engineering Entity Pipelines

To clarify their engineering flows, the following diagram illustrates typical workflows for TRAE SOLO and VS Code.

Before the diagram, here’s an introductory sentence:
The following Mermaid diagram visually compares the engineering pipelines of TRAE SOLO and VS Code, highlighting their respective collaboration models.

Figure 1: TRAE SOLO vs VS Code Engineering Entity Pipeline Comparison

This diagram shows two typical collaboration models:

TRAE SOLO attempts to encapsulate “context aggregation → planning → implementation → testing → preview/deployment” within a single engineering entity, with user intervention only at requirement input and output review.
VS Code + Copilot + Agent HQ uses the IDE as runtime, with Plan/Implementation/Review agents corresponding to different roles. Agent HQ supports parallel agent competition, allowing developers to select the best solution.

Model Transparency, Speed, and Predictability

Based on personal experience, here are the model transparency and speed issues from the engineering entity perspective:

Model Transparency

TRAE currently exposes “which model is called” poorly; switching MAX mode only suggests “stronger model or higher quota,” but no clear feedback.
Community feedback notes slow integration of new models; some strong models (like Claude series) are available elsewhere but not yet in TRAE.

This means:

TRAE is hard to use as a “precisely configurable engineering unit,” more like a black box, making model change management in CI/CD or production pipelines difficult.
VS Code + Copilot + Agent HQ is stronger in standardization; GitHub clearly marks Copilot’s model family, Agent HQ uses agent source as the abstraction boundary.

Speed and Predictability

TRAE SOLO’s “slowness” comes from executing more steps (reading files, analyzing, planning, testing) and insufficient engineering process visualization. The UI shows “Thinking…” prompts, making it hard to tell if it’s stuck or planning.
VS Code’s Plan mode explicitly lists plans and Todos; Agent mode emphasizes “execution by plan,” letting users clearly see the entity’s work status, improving predictability.

Agent HQ Positioning: Single Entity vs Multi-Entity Headquarters

From a platform perspective, GitHub and TRAE differ as follows:

Agent HQ’s core idea: future development will rely on multiple specialized agents collaborating in parallel. GitHub is building “agent headquarters,” not a single engineering agent. Developers can schedule agents in a unified control plane, integrating with existing GitHub Flow (Issue, PR, Review, CI/CD).
TRAE is more like “proprietary IDE + agent + full-stack context engineering,” delivering an integrated experience.

In terms of engineering entity organization:

GitHub is building “infrastructure and governance for multi-entity engineering systems.”
TRAE is building “vertically integrated engineering entity + private runtime.”

They’re not mutually exclusive, representing “broad platform + multi-entity scheduling” vs “single strong entity + proprietary toolchain.”

Subjective Experience and Engineering Framework Integration

Translating personal experience into engineering language:

VS Code is more accustomed to a “single IDE + multiple views” experience; TRAE splits IDE and SOLO modes, requiring mental switching.
TRAE’s engineering entity capabilities surpass ordinary completion tools, able to take on tasks, but model transparency and context quality are unstable, and governance needs improvement.
VS Code doesn’t take over the whole project, but local work is stable; Plan, Agent, and Review combinations enable multi-entity collaboration.

According to the “AI Engineering Entity (AIEE)” framework:

TRAE SOLO is a single AI engineering entity (AIEE) capable of handling complete engineering tasks, but still has clear shortcomings in model transparency, engineering governance, and enterprise-level controllability.
VS Code + Copilot + Agent HQ is an infrastructure platform for multiple engineering entities, less aggressive in end-to-end outsourcing in the short term, but clearer in engineering consistency, model replaceability, and organizational governance.

Summary

This article systematically compares TRAE SOLO and VS Code (with Copilot, Agent HQ) from the perspective of AI engineering entities, focusing on automation, collaboration, and model transparency. TRAE SOLO is better suited for individual developers or small teams seeking end-to-end automation, while VS Code + Copilot + Agent HQ provides stronger infrastructure for multi-entity collaboration, enterprise governance, and engineering consistency. In the future, AI engineering entities will become formal nodes in the software development pipeline, and tool selection should be based on engineering needs and governance requirements.