KubeCon + CloudNativeCon China 2025

Speakers

Jim Zemlin

Executive Director, The Linux Foundation

Zemlin’s career spans three of the largest technology trends to rise over the last decade: mobile computing, cloud computing and open source software. Today, as executive director of The Linux Foundation, he uses this experience to accelerate the adoption of Linux and support the... Read More →

Tuesday June 10, 2025 09:00 - 09:10 HKT
Level 16 | Grand Ballroom I

Content Experience Level Intermediate
Presentation Language English

09:12 HKT

Keynote: Community Opening Remarks - Chris Aniszczyk, CTO, Cloud Native Computing Foundation

Tuesday June 10, 2025 09:12 - 09:22 HKT

Speakers

Chris Aniszczyk

CTO, CNCF

Chris Aniszczyk is an open source executive and engineer with a passion for building a better world through open collaboration. He's currently a CTO at the Linux Foundation focused on developer relations and running the Open Container Initiative (OCI) / Cloud Native Computing Foundation... Read More →

Tuesday June 10, 2025 09:12 - 09:22 HKT
Level 16 | Grand Ballroom I

Content Experience Level Intermediate
Presentation Language English

09:24 HKT

Keynote: Crossplane Is the Answer! but What Is the Question? - Amit Dsouza, Odyssey Cloud & Cortney Nickerson, Nirmata

Tuesday June 10, 2025 09:24 - 09:34 HKT

Keynote Sessions, Platform Engineering

Why consider Crossplane when so many IaC tools exist—Terraform, Pulumi, CloudFormation, Config Connector, and KRO? What unique challenges does it solve, and is it always the right choice?
Join Cortney & Amit as they explore why Crossplane is gaining traction, not just as an IaC tool but as a Platform Engineering enabler. Learn how Crossplane extends the Kubernetes API to manage both infrastructure and applications declaratively, empowering platform teams.
Beyond provisioning, security and compliance are critical. Discover how the Crossplane + ArgoCD + Kyverno stack enables GitOps-driven automation, ensuring deployments align with organizational compliance and security policies.
Through real-world use cases, we’ll explore:
Where does Crossplane fit among IaC tools?
When is Crossplane NOT the right choice?
How can it enable scalable, self-service platforms?
How does it integrate with ArgoCD & Kyverno for GitOps and security?

Speakers

Amit DSouza

Co-founder, Odyssey Cloud

Amit Dsouza is an IT professional with over 13 years of experience in the industry. He is a co-founder of Odyssey Cloud, Australia. With experience in Fortune 500 companies & startups, he has worked in various locations including Australia, Singapore, & India. Amit specializes in... Read More →

Cortney Nickerson

Head of Community, Nirmata

Cortney is Head of Community at Nirmata. As a CNCF and Civo Ambassador, co-organizer of CNCF Bilbao Community, and speaker and organizing member of various KCD events, she is a recognized voice in the cloud native space. Initially, a non-techie, she turned techie as employee 7 at... Read More →

Tuesday June 10, 2025 09:24 - 09:34 HKT
Level 16 | Grand Ballroom I

Content Experience Level Intermediate
Presentation Language English

09:36 HKT

Sponsored Keynote: Towards Clouds of AI Clusters - Bill Ren, Huawei Chief Open Source Liaison Officer, Board member of CNCF

Tuesday June 10, 2025 09:36 - 09:41 HKT

AI is quickly becoming the most important workload in our clouds. However, AI is not like other cloud native workloads. Whereas before, clouds could manage elastic resources that easily and cheaply scaled out, AI workloads do not readily support this. AI hardware infrastructure is moving towards large clusters of processors, is not readily scaled out, is not readily available on-demand, and is much more expensive. This requires significant changes to how we build and
manage our clouds, from the operating system up to our cloud native infrastructure. This talk will highlight how this evolution towards clouds of AI clusters is happening through projects such as Linux, Volcano, and Karmada.

Speakers

Bill Ren

Chief Open Source Liaison Officer，Board member of CNCF, Huawei

Bill Ren holds an EMBA and Master Degree from Peking University, and a CS Bachelor Degree from Shanghai Jiaotong University. Since Joining Huawei in 2000, Bill served as an Intelligent Network Research and Development Engineer, Product Manager and Architect of India Branch, General... Read More →

Tuesday June 10, 2025 09:36 - 09:41 HKT
Level 16 | Grand Ballroom I

Content Experience Level Intermediate
Presentation Language English

09:43 HKT

Keynote: An Optimized Linux Stack for GenAI Workloads - Michael Yuan, WasmEdge

Tuesday June 10, 2025 09:43 - 09:53 HKT

Keynote Sessions, Emerging + Advanced

Running GenAI workloads on Linux is a challenge due to the complexity of AI runtime toolchains and dependencies of heterogeneous GPU devices. The problem is especially acute in containers where the host and guest OSes must have compatible versions of GPU drivers and application software stacks.

CNCF’s Flatcar Linux project aims to simplify containerized Linux deployment. It has an immutable system that can be optimized for both host and guest systems. Furthermore, it supports cross-platform and cross-GPU Wasm workloads. As Wasm runtimes such as WasmEdge and LlamaEdge support a wide range of AI models, Flatcar Linux has become a good candidate for running GenAI workloads in containers.

In this talk, we will cover the basics of Flatcar and its support for Wasm runtimes. We will also discuss WasmEdge’s support for portable AI models and inference applications. Finally, we will give a demo of a complete GenAI app running in Flatcar across GPUs and CPUs.

Speakers

Michael Yuan

Founder, Second State

Dr. Michael Yuan is a maintainer of WasmEdge Runtime (a project under CNCF) and a co-founder of Second State. He is the author of 5 books on software engineering published by Addison-Wesley, Prentice-Hall, and O'Reilly. Michael is a long-time open-source developer and contributor... Read More →

Tuesday June 10, 2025 09:43 - 09:53 HKT
Level 16 | Grand Ballroom I

Content Experience Level Any
Presentation Language English

09:55 HKT

Keynote: Scaling Model Training with Volcano: iFlytek’s Kubernetes Breakthrough - Dong Jiang, Platform Architect, iFlytek & Xuzheng Chang, Software Engineer, Huawei Cloud

Tuesday June 10, 2025 09:55 - 10:00 HKT

Training massive AI models at scale is tough—but doing it efficiently in Kubernetes is even tougher. In this keynote, we’ll share how iFlytek tackled key challenges in large-scale model training, including low GPU utilization, fragile workflows, and resource contention across teams. By leveraging Volcano, they boosted GPU usage by over 40%, and cut failure recovery time by 70%. This talk offers a quick but powerful look at how intelligent scheduling and orchestration can unlock performance, reliability, and fairness in multi-tenant AI platforms.

Speakers

Xuzheng Chang

Software Engineer, Huawei Cloud

Xuzheng Chang is a maintainer of the Volcano community, with in-depth research and practical experience in the fields of batch computing and cloud-native AI scheduling. Xuzheng has spearheaded several significant features within the Volcano community. Actively contributing to open-source... Read More →

Dong Jiang

Platform Architect, iFlytek

Tuesday June 10, 2025 09:55 - 10:00 HKT
Level 16 | Grand Ballroom I

Content Experience Level Intermediate
Presentation Language English

10:02 HKT

Keynote: The Future of AI in Hong Kong: From Local Innovation to Global Influence - Prof. Yike Guo, HKUST Provost and HKGAI Director & Roby Chen, CEO and Founder, DaoCloud

Tuesday June 10, 2025 10:02 - 10:12 HKT

The release of HKGAI V1 marks a new chapter in the development of AI in Hong Kong. Leveraging the strengths of both local connectivity and global outreach, the HKGAI team embraces the open-source community to tackle challenges ranging from optimizing high-performance computing clusters to exploring cutting-edge AI models. Looking ahead, Hong Kong aims to further integrate resources from mainland China and the international community, deepening technological innovation and application expansion, and contributing a "Hong Kong Solution" to global AI standards and use cases.

Speakers

Yike Guo

HKUST Provost and HKGAI Director

Professor Guo Yike assumed office as the Provost of the Hong Kong University of Science and Technology (HKUST) on December 1, 2022. He is concurrently a Chair Professor in the Department of Computer Science and Engineering and also the Director of Hong Kong Generative AI Research... Read More →

Roby Chen

CEO, DaoCloud

Roby Chen, Founder and CEO of DaoCloud, Master of Computer Science from Fudan University, CNCF Ambassador, has a deep understanding of cloud-native business models and technologies. Roby is an evangelist of open source cloud computing technology, gaining valuable experience in building... Read More →

Tuesday June 10, 2025 10:02 - 10:12 HKT
Level 16 | Grand Ballroom I

Presentation Language English

10:13 HKT

Keynote: Closing Remarks

Tuesday June 10, 2025 10:13 - 10:15 HKT

Keynote Sessions, Platform Engineering

Tuesday June 10, 2025 10:13 - 10:15 HKT
Level 16 | Grand Ballroom I

Content Experience Level Intermediate
Presentation Language English

10:15 HKT

Gold Sponsor In-Booth Demos

Tuesday June 10, 2025 10:15 - 10:45 HKT

Sponsor: AWS
Demo: Accelerate Cloud-Native Innovation with AI-Powered Development & Intelligent Infrastructure
Booth Number: G6

Sponsor: LFOSSA
Demo: Master Open Source Skills and Advance Your Career in the Age of AI
Booth Number: G2

In order to facilitate networking and business relationships at the event, you may choose to visit a third party’s booth or access sponsored content. You are never required to visit third party booths or to access sponsored content. When visiting a booth or participating in sponsored activities, the third party will receive some of your registration data. This data includes your first name, last name, title, company, address, email, standard demographics questions (i.e. job function, industry), and details about the sponsored content or resources you interacted with. If you choose to interact with a booth or access sponsored content, you are explicitly consenting to receipt and use of such data by the third-party recipients, which will be subject to their own privacy policies.

Tuesday June 10, 2025 10:15 - 10:45 HKT
Level 16 | Grand Ballroom II

Sponsored Demos, Gold Sponsor In-Booth Demos

10:15 HKT

Coffee Break ☕

Tuesday June 10, 2025 10:15 - 11:00 HKT

Tuesday June 10, 2025 10:15 - 11:00 HKT
Level 16 | Grand Ballroom II

Breaks

10:15 HKT

Project Pavilion Tables | Tuesday Morning

Tuesday June 10, 2025 10:15 - 14:30 HKT

Cilium P-2
Karpenter P-1
Kube - OVN P-7
Kubespray P-3
Kyverno P-4
Litmus P-5
WasmEdge Runtime P-8

Tuesday June 10, 2025 10:15 - 14:30 HKT
Level 16 | Grand Ballroom II

Project Opportunities

10:15 HKT

Solutions Showcase

Tuesday June 10, 2025 10:15 - 19:00 HKT

Whether you’re looking to expand your knowledge, connect with experts, or just enjoy a break, the Solutions Showcase is the place to be:

- Exhibits: Visit our sponsor booths to learn about the latest technologies and services.
- CNCF Project Tables: Interact with project maintainers and gain insights into community engagement.
- Attendee T-Shirt Pick-up: Grab your free conference t-shirt.
- Coffee + Tea, Snacks, Lunch Pick-up: Enjoy delicious treats served in the Solutions Showcase.

In order to facilitate networking and business relationships at the event, you may choose to visit a third party’s booth or to access sponsored content. You are never required to visit third-party booths or to access sponsored content. When visiting a booth or participating in sponsored activities, the third party will receive some of your registration data. This data includes your first name, last name, title, company, address, email, standard demographics questions (i.e. job function, industry), and details about the sponsored content or resources you interacted with. If you choose to interact with a booth or access sponsored content, you are explicitly consenting to receipt and use of such data by the third-party recipients, which will be subject to their own privacy policies.

Tuesday June 10, 2025 10:15 - 19:00 HKT
Level 16 | Grand Ballroom II

Solutions Showcase

11:00 HKT

Project Lightning Talks: Opening - Hoon Jo & Satyam Soni, CNCF Ambassadors

Tuesday June 10, 2025 11:00 - 11:05 HKT

Tuesday June 10, 2025 11:00 - 11:05 HKT
Level 16 | Grand Ballroom I

11:00 HKT

AI Model Distribution Challenges and Best Practices - Wenbo Qi, Xiaoya Xia & Peng Tao, Ant Group; Wenpeng Li, Alibaba Cloud; Han Jiang, Kuaishou

Tuesday June 10, 2025 11:00 - 11:30 HKT

Level 19 | Crystal Court I

As the demand for scalable AI/ML grows, efficiently distributing AI models in cloud-native infrastructure has become a pivotal challenge for enterprises. The panel dives into the technical and operational strategies for deploying models at scale -- from optimizing model storage and transfer to ensuring consistency across clusters and regions. Experts from different companies and CNCF projects will debate critical questions like: How can Kubernetes-native workflows automate and accelerate model distribution while minimizing latency and bandwidth costs? How to efficiently distribute huge models sizing hundreds of GBs or TBs? What are the challenges proposed by distributed inference and the prefilling-decoding architecture? How are models updated in the reinforcement learning post-training paradigm? What role do standards like OCI artifacts or specialized registries play in streamlining versioned model delivery?

Speakers

Peng Tao

Staff Engineer, Ant Group

Kata Containers architecture committee member, Nydus maintainer, and Linux kernel developer.

Han Jiang

Software Engineer, Kuaishou

Software Engineering from Kuaishou, previously worked in the Kubernetes ecosystem and container-related technologies. Currently, he is focused on optimizing the inference performance of large language models.

Xiaoya

Open Source Analyst, Ant Group

Xiaoya Xia is a member of the Ant Group OSPO, where she focuses on catalyzing open source success through data-driven insights. Before joining Ant Group, Xiaoya was a PhD at East China Normal University (ECNU), where she concentrated on research into open source ecosystem sustain... Read More →

Wenbo Qi

Software Engineer, Ant Group

Wenbo Qi is a software engineer at Ant Group working on Dragonfly. He is a maintainer of the Dragonfly. He hopes to do some positive contributions to open source software and believe that fear springs from ignorance.

Wenpeng Li

Alibaba Cloud

Tuesday June 10, 2025 11:00 - 11:30 HKT
Level 19 | Crystal Court I

KubeCon China 2025 Argo Workflows Intro Updates and Deep Dive pdf

Content Experience Level Beginner
Presentation Language Chinese

11:00 HKT

Argo Workflows: Intro, Update and Deep Dive - Shuangkun Tian & Yashi Su, Alibaba Cloud

Tuesday June 10, 2025 11:00 - 11:30 HKT

Level 21 | Pearl Pavilion

Since graduating from CNCF, Argo Workflows has seen widespread adoption across industries. But how does it work? What are its latest features? And how can it handle large-scale task orchestration effectively? This talk will answer these questions.
The talk begins with an overview of Argo Workflows’ core principles and the latest community developments, including new features like scheduling strategies, dynamic templates, and backfill capabilities. It then dives into best practices for large-scale task orchestration, covering high-availability deployment, workflow partitioning, and more.
A key focus is on storage systems, which are critical for efficient large-scale task execution. The talk will share insights on selecting the right file and object storage solutions, implementing read-write separation, and choosing an optimal caching system. These strategies are essential for building scalable, high-performance pipelines.

Speakers

Yashi Su

Software Engineer, Alibaba Cloud

Yashi Su is a software engineer at Alibaba Cloud, focusing on Kubernetes Container Storage Interface (CSI) for object storage. She maintains OSSFS (a FUSE daemon for Alibaba Object Storage Service) used in Cloud-Native scenarios and researches how to improve the read-write performance... Read More →

Shuangkun Tian

Alibaba Cloud Software Engineer, Alibaba Cloud

ShuangKun Tian is a software engineer at Alibaba Cloud, specializing in Scheduling, Elasticity, Workflow Orchestration and Performance tuning. He is a maintainer of the Argo Community. He has extensive practical experience in MLOps, Data Processing, CI/CD, and Large-scale Storage... Read More →

Tuesday June 10, 2025 11:00 - 11:30 HKT
Level 21 | Pearl Pavilion

Maintainer Track

11:00 HKT

An Alternative Metadata System for Large Kubernetes Clusters - Yingcai Xue & Yixiang Chen, ByteDance

Tuesday June 10, 2025 11:00 - 11:30 HKT

KubeCon China 2025 An Alternative Metadata System for Large Kubernetes Clusters pdf

For an event-driven distributed system like Kubernetes, where components communicate by synchronizing incremental data through the KubeAPIServer, the metadata system is the most critical component. The ETCD is the only official supported metadata system, but some projects like kine explored alternative metadata storage, but they're either not open-sourced or have performance issues.
This talk covers ByteDance's work on high-performance Kubernetes metadata systems. It summarizes ETCD's production issues, analyzes Kubernetes' metadata storage requirements and introduces how we solve it with kubebrain.
Actual results from large-scale environments (over 20K nodes, 1M pods over years) show that KubeBrain enhances cluster performance and stability.
This talk helps understand the challenges of metadata systems in large-scale clusters and provides insights into an open-source solution that has been practiced in ByteDance's production environment.

Speakers

Yixiang Chen

Software Engineer, ByteDance

Yixiang is a seasoned cloud-native technologist with over 9 years of hands-on experience at ByteDance, where he has been at the forefront of large-scale Kubernetes ecosystem innovations. As a core contributor in cloud-native infrastructure, his expertise spans multiple domains including... Read More →

Yingcai Xue

Software Engineer, ByteDance

- graduated from Zhejiang University with a master degree

Tuesday June 10, 2025 11:00 - 11:30 HKT
Level 19 | Crystal Court II

Content Experience Level Advanced
Presentation Language Chinese

11:07 HKT

Project Lightning Talk: Build Secure, Build Easy, with Buildpacks - Ram Iyengar, Maintainer

Tuesday June 10, 2025 11:07 - 11:12 HKT

Cloud Native Buildpacks presents a great way to build containers that can then be deployed to Kubernetes. In this talk, I will demo how a container can be built, how it uses optimal paths by default, and how it can promote security.

I will also present project updates, upcoming areas of work, and where we need help and support from the community.

Tuesday June 10, 2025 11:07 - 11:12 HKT
Level 16 | Grand Ballroom I

11:14 HKT

Project Lightning Talk: openGemini: Project Introduction and Updates - Yu Xiang, Maintainer

Tuesday June 10, 2025 11:14 - 11:19 HKT

With the rapid development of cloud computing, IoV, and IoT, time series data, such as metrics and logs, increases rapidly. As a result, time series databases face higher challenges in terms of read/write performance, data analysis efficiency, and data storage costs.

openGemini aims to reduce data storage costs, quickly write massive time series data, and efficiently analyze. Open source in 2022 and became a sandbox project of CNCF in 2024.

Now, openGemini has been applied in 9 scenarios, including the Internet of Things (IoT), DevOps, Internet of Vehicles (IoV), electric power, energy, mining, logistics, and aerospace, with 177 contributors. More and more developers are exploring the technical advantages of openGemini.

In this lighting talk, the following topics will be covered:

1. Briefly introduction to openGemini

2. Core Competencies

3. Key User Cases

4. Community Updates and Technology Planning

Tuesday June 10, 2025 11:14 - 11:19 HKT
Level 16 | Grand Ballroom I

11:21 HKT

Project Lightning Talk: Fluid Data Anyway, Data Anywhere, Data Anytime - Tongyu Guo, Maintainer

Tuesday June 10, 2025 11:21 - 11:26 HKT

Fluid is an open-source project for orchestrating data and workloads in Kubernetes. In the 2024 CNCF Technology Radar Report, Fluid is recognized as an "Adopted" project in the cloud-native AI landscape, considered ready for use by developers without further evaluation.

Maintainer from the Fluid community will reveal why it is so popular, detailing its architecture and the "Data Anyway, Anywhere, Anytime" features. He will also showcase the dynamic data mounting capabilities beneficial for data scientists, along with insights into future feature plans.

Tuesday June 10, 2025 11:21 - 11:26 HKT
Level 16 | Grand Ballroom I

11:28 HKT

Project Lightning Talk: K8s issue #52757: Sharing GPUs Among Multiple Containers - Xiao Zhang, Maintainer

Tuesday June 10, 2025 11:28 - 11:33 HKT

This issue has plagued Kubernetes for nearly 8 years: K8s issue #52757. The challenge of flexibly sharing GPUs across multiple containers is particularly prominent in AI scenarios, where inference tasks are typically short-lived. As a result, resource utilization becomes a critical concern.

In this talk, we will share solutions and practices for implementing GPU sharing in Kubernetes, focusing on two key projects gaining traction recently: Dynamic Resource Allocation (DRA) and the CNCF sandbox project HAMi. The presentation will cover the following topics:

1. Challenges in GPU sharing.

2. Approaches for sharing AI chips beyond NVIDIA GPUs.

3. How sharing technologies integrate with projects like Volcano, Koordinator, and Kueue.

Tuesday June 10, 2025 11:28 - 11:33 HKT
Level 16 | Grand Ballroom I

11:35 HKT

Project Lightning Talk: Practical Extension of eBPF Usability in KubeCon - Lizhencheng, Maintainer

Tuesday June 10, 2025 11:35 - 11:40 HKT

Due to security-related considerations, eBPF technology imposes significant restrictions on the use of kernel functions, typically allowing indirect calls to kernel functions only through helper functions, which is quite inconvenient for applications related to KubeCon. Kmesh requires intrusive modifications to the kernel to implement Kernel-native Mode Seven-layer Traffic Governance Capability, hindering the adoption of the technology. Through exploration and research on the kernel, we have employed some methods to extend the usability of eBPF, reducing the need for intrusive modifications to the kernel, and enabling Kmesh-related capabilities on higher versions of Linux without requiring intrusive kernel modifications.

Tuesday June 10, 2025 11:35 - 11:40 HKT
Level 16 | Grand Ballroom I

11:42 HKT

Project Lightning Talk: Simplifying Multi-Cluster Integrations with OCM Addon - Jian Zhu, Maintainer

Tuesday June 10, 2025 11:42 - 11:47 HKT

Open Cluster Management (OCM) allows easy integration with other projects via its Addon mechanism, enabling them to leverage multi-cluster capabilities. This 5-minute talk will introduce the OCM Addon mechanism, showing how projects can integrate with OCM as addons. I will also highlight the AddonTemplate API, which simplifies addon development by providing simple yaml files, reducing complexity and accelerating integration.

Key points:

- OCM Addon Overview: Introduction to the Addon mechanism and its role in multi-cluster environments.

- Addon Integration: How projects (e.g., Fluid) integrate with OCM to enhance multi-cluster management.

- AddonTemplate API: How the API simplifies addon creation and management.

- Real-World Benefits: Demonstrating the efficiency and scalability of OCM Addons.

This talk will help attendees understand how OCM Addons can help other projects extend the multicluster management capability.

Tuesday June 10, 2025 11:42 - 11:47 HKT
Level 16 | Grand Ballroom I

11:45 HKT

Defining a Specification for AI/ML Artifacts - Fog Dong, BentoML; Peng Tao & Chlins Zhang, Ant Group; Xudong Wang, Paypal

Tuesday June 10, 2025 11:45 - 12:15 HKT

Level 19 | Crystal Court I

AI has become a prominent figure in the cloud native ecosystem and there continues to be massive adoption in this emerging field. As frameworks and approaches are introduced, a pattern has emerged which threatens the ability to manage at scale: each implementation introduces their own format, runtime, and different ways of working, fragmenting the ecosystem. On other hand, open standards are the backbone of cohesive and scalable ecosystems.

This panel discussion seeks to explore the importance of defining standards within the CNCF ecosystem, particularly focusing on AI/ML artifacts. Beyond the advantages of the standard in facilitating integration with existing cloud native tools, this conversation will delve into how the standards can serve as a foundation for innovation. Join us to understand how standardization with innovative approaches can advance the cloud native AI landscape.

Speakers

Chlins Zhang

Software Engineer, Ant Group

Chenyu Zhang is a software engineer at Ant Group, currently mainly responsible for the development and maintenance of project harbor, and also has some experience in devops and cloud native related technology stacks.

Xudong Wang

PayPal

Peng Tao

Staff Engineer, Ant Group

Kata Containers architecture committee member, Nydus maintainer, and Linux kernel developer.

Fog Dong

Senior Software Engineer, BentoML

董天欣目前在 BentoML担任资深工程师，同时，她也是 KubeVela 的核心维护者以及 CNCF 大使。她致力于开源社区的建设，并不遗余力地为推动开源项目的发展而努力，尤其是在云原生 DevOps 领域。目前，她在 BentoML... Read More →

Tuesday June 10, 2025 11:45 - 12:15 HKT
Level 19 | Crystal Court I

Content Experience Level Intermediate
Presentation Language English

11:45 HKT

Kubernetes New Contributor Orientation - Paco Xu, DaoCloud; ZhenYu Jiang & Mengjiao Liu, Independent

Tuesday June 10, 2025 11:45 - 12:15 HKT

Level 21 | Pearl Pavilion

This meeting is meant to orient you in the Kubernetes community.

Part 1: Presentation and Intro
● Welcome to Kubernetes!
● What is Kubernetes?
● Kubernetes Community Structure
● What does it mean to be a “Contributor”?
● How to Start Contributing
● Current Work Opportunities
● Contribution Pitfalls

Part 2: New contributors journey
We will invite some new contributors in the community to share their fresh experience and tips to you.
● How did I get involved with Kubernetes?
● What is most important in participating in Kubernetes community journey?
● Some tip to participate in Kubernetes community
● How to submit a "polite" PR?

Speakers

Paco Xu

Lead of open source team, DaoCloud

Paco is co-chair of KubeCon+CloudNativeCon China 2024, and a member of Kubernetes Steering Committee. Paco is a kubeadm maintainer and an active kubernetes contributor. He is the leader of the open-source team in DaoCloud. He organized KCD Chengdu 2022 and KCS China 2023, and... Read More →

Mengjiao Liu

Software Engineer, Independent

Mengjiao Liu is a Software Engineer. She contributes to Kubernetes and serves as the WG Structured Logging Lead and SIG Instrumentation Reviewer, focusing on enhancing logging quality. Additionally, she actively participates in SIG Docs as a Chinese owner and English reviewer, working... Read More →

ZhenYu Jiang

Cloud Native Developer, Independent

He is a member of the kubernetes community and has been participating in the kubernetes community since 2024.

Tuesday June 10, 2025 11:45 - 12:15 HKT
Level 21 | Pearl Pavilion

Maintainer Track

11:45 HKT

From Bottleneck To Breakthrough: Conquering Applications Startup Peaks in Kubernetes - Hexi Guo, Alibaba Cloud; Rentian Zhou & Zhuoqi Liu, CloudPilot AI

Tuesday June 10, 2025 11:45 - 12:15 HKT

A variety of applications in Kubernetes typically require higher memory or compute resources during startup—such as Java, .NET, and Node.js applications, as well as those utilizing large data processing frameworks or machine learning models—due to the need to load substantial dependencies and perform complex initialization tasks. To prevent startup failures from resource contention, these applications typically have their resource requests set based on peak startup demands. However, this often leads to resource waste after startup is complete.
To address this challenge, this session presents a queue-based approach using Karpenter. This method allows applications set resource requests based on typical usage instead of peak startup needs. It temporarily spreads applications across multiple smaller nodes during startup, preventing single-node overload. After startup, it smoothly consolidates them onto fewer but larger nodes to optimize resource usage while maintaining service stability.

Speakers

Zhuoqi Liu

Senior Software Engineer, CloudPilot AI Inc

Hexi Guo

Software Engineer, Alibaba Cloud

Alibaba Cloud technical expert, maintainer of Kubernetes elastic scaling component cluster-autoscaler, initiator of open source elastic component kubernetes-cronhpa-controller, responsible for the design and implementation of elastic solutions for Alibaba Cloud industry customers... Read More →

Rentian Zhou

Software Engineer, CloudPilot AI

Rentian, a Software Engineer at CloudPilot AI, focuses on the Karpenter open-source project, contributing to karpenter-provider-alibabacloud and -aws. He has also contributed to various projects and serves as a Karmada Reviewer, the Member of the Volcano and Hwmaeistor communities... Read More →

hexi KubeCon China 2025 Branded PowerPoint 060412007pptx pptx

Tuesday June 10, 2025 11:45 - 12:15 HKT
Level 19 | Crystal Court II

Content Experience Level Any
Presentation Language English

11:49 HKT

Project Lightning Talk: KubeEdge Updates and Use Cases in Multiple Scenarios - Yue Bao, Maintainer

Tuesday June 10, 2025 11:49 - 11:54 HKT

KubeEdge, the industry’s first cloud-native open-source edge computing project, has achieved CNCF graduation last year. In this session, we will share the new features and advancements in community governance since graduation.

As a graduated project, KubeEdge has been widely used in intelligent transportation, smart city, smart park, smart energy, smart factory, smart bank, smart site, CDN and other industries to provide users with integrated edge cloud collaborative solutions. This session will also share the 10+ KubeEdge user cases in various industries, to help users understand the practical experience of cloud-native edge computing and edge AI.

Tuesday June 10, 2025 11:49 - 11:54 HKT
Level 16 | Grand Ballroom I

11:56 HKT

Cancelled: Project Lightning Talk: Meshery: Kubernetes without Yaml, Is It Possible? - Yash Sharma, Maintainer

Tuesday June 10, 2025 11:56 - 12:01 HKT

Kubernetes has evolved into a complex ecosystem with numerous core components and hundreds of Custom Resources. This complexity poses significant challenges when designing workloads that involve multiple technologies. Engineers often find themselves burdened with Complex Configuration Management with YAML files, ensuring correct network configurations, RBAC rules and so on which is tedious and error-prone.

Developers often need to manually copy and paste reference configurations or manage and store either Helm or Kustomize templates to achieve this which has a high learning curve and is difficult especially for newcomers to ecosystem.

This talk helps in understanding how Meshery a CNCF project and cloud-native manager, with intuitive visual interface, reduces cognitive load, aligns with users' mental models, streamlines infrastructure design backed by OPA policies and how Meshery makes Kubernetes more accessible, empowering you to visualize your infrastructure.

Tuesday June 10, 2025 11:56 - 12:01 HKT
Level 16 | Grand Ballroom I

12:03 HKT

Project Lightning Talk: Kyverno Lightning Update: CEL & Policy Types in Action - Shuting Zhao, Maintainer

Tuesday June 10, 2025 12:03 - 12:08 HKT

Get a rapid snapshot of Kyverno’s latest features! In this 5-minute talk, Shuting Zhao highlights how Kyverno now supports CEL (Common Expression Language) for expressive, dynamic policies and introduces new policy types to align with Kubernetes’ ValidatingAdmissionPolicy and MutatingAdmissionPolicy. See how these updates empower you to create more flexible policies, improve cluster security, and streamline compliance workflows. Whether you’re managing policies or exploring Kyverno for the first time, this session offers a quick, impactful look at what’s new and how it can benefit your Kubernetes environment.

Tuesday June 10, 2025 12:03 - 12:08 HKT
Level 16 | Grand Ballroom I

12:10 HKT

Project Lightning Talk: What's New in WasmEdge 0.15.0? - Michael Yuan, Maintainer

Tuesday June 10, 2025 12:10 - 12:15 HKT

WasmEdge 0.15.0 is coming soon! This release brings key WebAssembly features including the component model proposal, plus expanded support for multimodal models and the latest OpenVINO plugin support. Join us to learn about the release highlights and future roadmap.

Tuesday June 10, 2025 12:10 - 12:15 HKT
Level 16 | Grand Ballroom I

12:15 HKT

Project Lightning Talks: Closing - Cortney Nickerson, CNCF Ambassador

Tuesday June 10, 2025 12:15 - 12:17 HKT

Tuesday June 10, 2025 12:15 - 12:17 HKT
Level 16 | Grand Ballroom I

12:15 HKT

Gold Sponsor In-Booth Demos

Tuesday June 10, 2025 12:15 - 12:45 HKT

Sponsor: Intel
Demo: Seamless GenAI Experience from Cloud to Edge with OPEA & Open Edge Platform
Booth Number: G3

In order to facilitate networking and business relationships at the event, you may choose to visit a third party’s booth or access sponsored content. You are never required to visit third party booths or to access sponsored content. When visiting a booth or participating in sponsored activities, the third party will receive some of your registration data. This data includes your first name, last name, title, company, address, email, standard demographics questions (i.e. job function, industry), and details about the sponsored content or resources you interacted with. If you choose to interact with a booth or access sponsored content, you are explicitly consenting to receipt and use of such data by the third-party recipients, which will be subject to their own privacy policies.

Tuesday June 10, 2025 12:15 - 12:45 HKT
Level 16 | Grand Ballroom II

Sponsored Demos, Gold Sponsor In-Booth Demos

12:15 HKT

Lunch 🍲

Tuesday June 10, 2025 12:15 - 13:45 HKT

Tuesday June 10, 2025 12:15 - 13:45 HKT
Level 16 | Grand Ballroom II

Breaks

13:15 HKT

Gold Sponsor In-Booth Demos

Tuesday June 10, 2025 13:15 - 13:45 HKT

Sponsor: Alibaba Cloud
Demo: Showcase of Alibaba Cloud Services and Incubated Open Source Projects
Booth Number: G4

Sponsor: Arm
Demo: 1. Cloud Native AI Workloads on Arm-based Kubernetes 2. Native Full Rancher Stack
Booth Number: G1

In order to facilitate networking and business relationships at the event, you may choose to visit a third party’s booth or access sponsored content. You are never required to visit third party booths or to access sponsored content. When visiting a booth or participating in sponsored activities, the third party will receive some of your registration data. This data includes your first name, last name, title, company, address, email, standard demographics questions (i.e. job function, industry), and details about the sponsored content or resources you interacted with. If you choose to interact with a booth or access sponsored content, you are explicitly consenting to receipt and use of such data by the third-party recipients, which will be subject to their own privacy policies.

Tuesday June 10, 2025 13:15 - 13:45 HKT
Level 16 | Grand Ballroom II

Sponsored Demos, Gold Sponsor In-Booth Demos

13:45 HKT

Fast and Furious: Practice in Horizon Robotics on Large-scale End-to-end Model Training - Chen Yangxue, Horizon Robotics & Zhihao Xu, Alibaba Cloud

Tuesday June 10, 2025 13:45 - 14:15 HKT

Level 19 | Crystal Court I

End-to-end large model training is crucial for advancing autonomous driving technology. Horizon Robotics leads in this field by leveraging deep learning algorithms and chip design. They efficiently train and deploy advanced perception models like Sparse4D using cloud-native technologies.
Training these models poses challenges, such as managing massive video data and numerous small files. Ensuring high-performance training with over 2000 GPUs on RDMA, quickly identifying different failures, and diagnosing issues in large-scale training.
This session covers how Horizon Robotics manages large-scale training on Kubernetes. It highlights the role of distributed data caching, network topology awareness, and job affinity scheduling in optimizing a 2000 GPU training job. We'll also discuss strategies for restoring interrupted training jobs through backup machine replacement to enhance task resilience. Furthermore, experiences with CNCF projects like Volcano, Fluid, and NPD will be shared.

Speakers

Zhihao Xu

Software Engineer, Alibaba Cloud

Zhihao Xu is currently a software engineer at Alibaba Cloud focusing on infrastructure for AI model training and large-scale model inference. Also, he is now a Maintainer of the CNCF sandbox project Fluid, which is designed for data orchestration for data-intensive applications running... Read More →

Chen Yangxue

Software Engineer, Horizon Robotics

I'm Chen Yangxue, a software engineer at Horizon Robotics. With years of cloud - native experience, I'm building a ten - thousand - card training platform with a hybrid cloud setup.I've used tools like Kubernetes, Volcano, etc., to solve tough technical problems. I know how to optimize... Read More →

Tuesday June 10, 2025 13:45 - 14:15 HKT
Level 19 | Crystal Court I

Content Experience Level Any
Presentation Language Chinese

13:45 HKT

Multi-cluster Orchestration System: Karmada Updates and Use Cases - Hongcai Ren, Huawei

Tuesday June 10, 2025 13:45 - 14:15 HKT

Level 21 | Pearl Pavilion

Karmada (Kubernetes Armada) is a Kubernetes management system that enables you to run your cloud-native applications across multiple Kubernetes clusters and clouds.

In this presentation, the maintainer of the Karmada project will share:

- A Brief introduction to Karmada, including what it is and why you need it.
- Key features and real-world use cases
- Overview of the community, including the governance and how it works
- New features over the last year
* Migration Rollback (with Zendesk, MoMo)
*Lightway Stateful Application failover (with Bloomberg)
* Manage HA Karmada instance by operator (with Bloomberg)
* OverridePolicy support (with Longbridge)
* Cluster Level Propagation Pause and Resume (with Zendesk, MoMo)
* Karmadactl Enhancements (with Huawei)
- Future Plan
- QA

Speakers

Hongcai Ren

Senior Software Engineer(maintainer of Karmada project), Huawei

Hongcai Ren(@RainbowMango) is the CNCF Ambassador, who has been working on Kubernetes and other CNCF projects since 2019, and is the maintainer of the Kubernetes and Karmada projects.

Tuesday June 10, 2025 13:45 - 14:15 HKT
Level 21 | Pearl Pavilion

Maintainer Track

13:45 HKT

Antipatterns in Observability: Lessons Learned and How OpenTelemetry Solves Them - Steve Flanders, Splunk

Tuesday June 10, 2025 13:45 - 14:15 HKT

Observability is essential, but common antipatterns like over-collecting data, siloed tools, and poorly instrumented code can derail your efforts. This session uncovers the most frequent observability pitfalls and shows how OpenTelemetry addresses these challenges with its standardized approach. From eliminating vendor lock-in to streamlining telemetry pipelines, you’ll gain insights into building a more effective and sustainable observability strategy. Real-world examples will highlight how teams have successfully overcome these antipatterns, empowering you to avoid costly mistakes and maximize OpenTelemetry’s potential.

Speakers

Steve Flanders

Senior Director of Engineering, Splunk

Steve Flanders is a Senior Director of Engineering at Splunk responsible for the Observability Platform team, which includes contributions to the OpenTelemetry project. Previously, he was the Head of Product and Experience at Omnition, which Splunk acquired. Prior to Omnition, he... Read More →

Tuesday June 10, 2025 13:45 - 14:15 HKT
Level 16 | Grand Ballroom I

Content Experience Level Intermediate
Presentation Language English

13:45 HKT

Building Ultra-Large-Scale Cloud Native Edge Systems Using Chaos Engineering - Yue Bao, Huawei Cloud Computing Technology & Yue Li, DaoCloud

Tuesday June 10, 2025 13:45 - 14:15 HKT

Fast growing technologies, such as 5G networks, industrial Internet, and AI, are giving edge computing an important role in driving digital transformation. As each new technology brings benefits, it brings challenges. First, there are massive heterogeneous edge devices and it encompass a broad range of device types. Second, Edge devices are often located in unstable and complex physical and network environments, such as limited bandwidth, high latency, etc. How to overcome these challenges and build a stable, large-scale edge computing platform needs to be resolved.
KubeEdge is an open source edge computing framework that extends the power of kubernetes from central cloud to edge. Now, Kubernetes clusters powered by KubeEdge, can stably support 100,000 edge nodes and manage more than one million pods.
In this session, we will share the Key challenges of manage massive heterogeneous edge nodes and tell how using ChaosMesh to makes KubeEdge more Reliable in large-scale edge nodes.

Speakers

Yue Bao

Senior Software Engineer, Huawei Cloud Computing Technology Co., Ltd.

Yue Bao serves as a software engineer of Huawei Cloud. She is now working 100% on open source, focusing on lightweight edge for KubeEdge. She is the maintainer of KubeEgde and also the tech leader of KubeEdge SIG Release and Node. Before that, Yue worked on Huawei Cloud Intelligent... Read More →

yue li

Software Quality Engineer, DaoCloud

work at DaoCloud as Quality Director, more than 20 years IT industry experience, China Mobile, Siemens, HP, EMC, and startup company. Newcomer in Cloud Native and open source fans. Would like to adopt open source projects to improve enterprise software quality with fast release.

Tuesday June 10, 2025 13:45 - 14:15 HKT
Level 19 | Crystal Court II

Content Experience Level Any
Presentation Language Chinese

13:45 HKT

Maintainer Meetup

Tuesday June 10, 2025 13:45 - 15:45 HKT

Level 20 | Salon 5

The Maintainer Meetup is for CNCF Maintainers to share best practices, dive into contributing processes, and solve common problems across projects.

Tuesday June 10, 2025 13:45 - 15:45 HKT
Level 20 | Salon 5

Project Opportunities

14:30 HKT

More Than Model Sharding: LWS & Distributed Inference - Peter Pan & Nicole Li, DaoCloud & Shane Wang, Intel

Tuesday June 10, 2025 14:30 - 15:00 HKT

Level 19 | Crystal Court I

Large LLM like Llama3.1-405B or Deepseek-V3 (671B), require distributed inference across multiple-nodes like vLLM + Ray backend.
However, it's more than just model-slicing with tensor-parallelism, Native K8S treats those workloads across nodes irrelevantly , so challenges come:
- standalone statefulSets without coordination
- demand of Gang-scheduling
- uncontrolled startup order among master & workers, causing boot lag
- HPA as a whole instead of for each sts, to scale together for both Ray head/worker.
- stable index and rank
- topology aware grouping
- failure recovery for vllm/pytorch(not smart enough), to avoid one pod/GPU failure disrupting overall inference

----
So LWS - LeaderWorkerSet (github.com/kubernetes-sigs/lws) , is designed to address them:
- to optimize resource coordination with leader-worker set
- improve performance thru co-location
- integrate scaling with HPA for whole lws together
- all-or-nothing restart policy to fault tolerance as a group.

Speakers

Shane Wang

Engineering Director, Intel

Shane Wang is an engineering manager for networking and storage at Intel's System Software Products. He has participated in or led his team on research and development of open source software projects such as Xen, tboot, Yocto and OpenStack. Since 2015, he has served as an individual... Read More →

Nicole Li

Cloud Native Developer, DaoCloud

Cloud Native Developer, Service Mesh & Istio Contributor

Peter Pan

R&D Engineering VP, Daocloud

- DaoCloud Software Engineering VP- Regular KubeCon "Program Committee" : 2023 EU, 2024 HK, 2024 India, 2025 EU- Regular KubeCon Speaker: 2023 SH, 2024 EU, 2024 HK- Maintainer of below CNCF projects : cloudtty, kubean, hwameistor- CNCF WG-AI (AI Working-Group) Member + CNAI white-paper... Read More →

More than Model Sharding LWS v2.6 pdf

Tuesday June 10, 2025 14:30 - 15:00 HKT
Level 19 | Crystal Court I

Content Experience Level Intermediate
Presentation Language Chinese

14:30 HKT

New Pattern for Sailing Multi-host LLM Inference - Kante Yin, DaoCloud

Tuesday June 10, 2025 14:30 - 15:00 HKT

Level 21 | Pearl Pavilion

Inference workloads are becoming increasingly prevalent and vital in Cloud Native world. However, it's not easy, one of the biggest challenges is large foundation model can not fit into a single node, like llama 3.1-405B or DeepSeek R1, which brings out the distributed inference with model parallelism, again, make serving inference workloads more complicated.

LeaderWorkerSet, aka. LWS, is a dedicated multi-host inference project aims to solve this problem, it's a project under the guidance of Kubernetes SIG-Apps and Serving Working Group. It offers a couple of features like dual-template for different types of Pods, fine-gained rolling update strategies, topology managements and all-or-nothing failure handlings.

In this session, we'll introduce the capacities of lws and showcase the practice from our adopters like nvidia, google, and we'll demonstrate the integration with the most popular inference engines, such as vLLM, SGLang.

Speakers

Kante Yin

Software Engineer, DaoCloud

Kante is a senior software engineer and an open source enthusiast from DaoCloud, his work is mostly around scheduling, resource management and LLM inference. He actively contributes to upstream Kubernetes as SIG-Scheduling Maintainer and helps in incubating several projects like Kueue... Read More →

Tuesday June 10, 2025 14:30 - 15:00 HKT
Level 21 | Pearl Pavilion

Maintainer Track

14:30 HKT

Advancing Observability With Compile-Time Auto-Instrumentation in Golang - Liu Ziming, Alibaba Cloud & Przemek Delewski, Quesma

Tuesday June 10, 2025 14:30 - 15:00 HKT

Observability for cloud-native software applications requires efficient and reliable methods to gain insights into distributed systems. This talk will explore various instrumentation approaches for Golang, focusing on the concept of compile-time auto-instrumentation with OpenTelemetry. We will unveil implementation details of compile-time auto-instrumentation, highlighting the revolutionary features including flexible custom plugin capabilities, enhanced context propagation, trace-log correlation, and etc. The talk will cover examples of using compile-time auto instrumentation, lessons learned from the practice and scenarios that benefit from such an implementation. The audience will take away a solid understanding of how compile-time auto instrumentation works and why it presents an efficient and more performant solution for achieving observability.

Speakers

Przemek Delewski

Principal Architect, Quesma

Przemek is a founding engineer at Quesma, working in the data transformation space and responsible for architectural direction. An observability veteran with over 15 years of experience at Dynatrace and Sumo Logic. OpenTelemetry Maintainer. Designs programming languages for fun

Liu Ziming

Engineer, Alibaba Cloud

Alibaba R&D Engineer

Tuesday June 10, 2025 14:30 - 15:00 HKT
Level 16 | Grand Ballroom I

Content Experience Level Advanced
Presentation Language English

14:30 HKT

Unlocking Kyverno: Mastering Policy Management in Large-Scale Kubernetes Clusters - Di Xu, Xiaohongshu & Xu Liu, RedNote

Tuesday June 10, 2025 14:30 - 15:00 HKT

With the growing adoption of Kubernetes, managing configurations and ensuring compliance across extensive clusters becomes increasingly complex. Kyverno, a native Kubernetes policy engine, offers a streamlined solution to these challenges. In this session, we'll explore how adopting Kyverno can enhance efficiency, simplify operations, centralize control, and reduce maintenance in Kubernetes environments. We'll demonstrate how Kyverno empowers organizations to effectively manage policies and tackle the unique challenges of large-scale Kubernetes deployments. Drawing from real-world experiences, we will share valuable lessons and best practices that facilitate seamless policy integration and management. Attendees will gain practical insights and tools to optimize their Kubernetes environments using Kyverno.

Speakers

Di Xu

CNCF Ambassador | Principle Software Engineer, Xiaohongshu

Currently, he works at Xiaohongshu leading a team focused on building a highly reliable and scalable container platform. He is the founder of CNCF Sandbox Project Clusternet. Also, he is a top 50 code contributor in Kubernetes community. He had spoken many times at open source conferences... Read More →

Xu Liu

Senior Software Engineer, Xiaohongshu

Focused on the cloud native field, with extensive experience in managing large-scale Kubernetes clusters, container networking and serivcemesh.

Tuesday June 10, 2025 14:30 - 15:00 HKT
Level 19 | Crystal Court II

Content Experience Level Any
Presentation Language Chinese

14:45 HKT

Project Pavilion Tables | Tuesday Afternoon

Tuesday June 10, 2025 14:45 - 19:00 HKT

Argo P-3
Cilium P-2
Cloud Native Buildpacks P-8
Cozystack P-6
Dragonfly P-7
Karpenter P-1
LoxiLB P-5
OpenTelemetry P-4

Tuesday June 10, 2025 14:45 - 19:00 HKT
Level 16 | Grand Ballroom II

Project Opportunities

15:00 HKT

Coffee Break ☕

Tuesday June 10, 2025 15:00 - 15:30 HKT

Tuesday June 10, 2025 15:00 - 15:30 HKT
Level 16 | Grand Ballroom II

Breaks

15:30 HKT

⚡ Lightning Talk: Achieving Unstoppable Stability: Deploying OceanBase Across Multiple Kubernetes Clusters - Peng Wang, OceanBase

Tuesday June 10, 2025 15:30 - 15:35 HKT

⚡ Lightning Talks, Data Processing + Storage

Distributed databases like OceanBase offer scalability and fault tolerance but can be challenging to manage in Kubernetes. Kubernetes is widely used for managing workloads, but deploying OceanBase on a single cluster creates a risk of failure. If the cluster fails, the entire database may become unavailable, which is problematic in production environments.

This talk will explore how deploying OceanBase across multiple Kubernetes clusters can solve this problem. Distributing the database across clusters ensures high availability and reduces the impact of a cluster failure. It also makes Kubernetes upgrades safer for operations teams.

We’ll cover the challenges of managing distributed databases in Kubernetes, like data consistency and load balancing. We’ll also show how multi-cluster deployments improve stability and resilience, making the solution stronger for critical applications. Attendees will learn how this architecture boosts fault tolerance and simplifies database management.

Speakers

Peng Wang

Global Technical Evangelist, OceanBase

Peng Wang is the Global Technical Evangelist for OceanBase, a distributed relational database designed for cloud-native applications. He has over a decade of experience in the database industry, including his previous role as a team lead in Intel’s database R&D group.He is currently... Read More →

OceanBase KubeCon China 2025 pdf

Tuesday June 10, 2025 15:30 - 15:35 HKT
Level 16 | Grand Ballroom I

Content Experience Level Any
Presentation Language Chinese

15:30 HKT

Smart GPU Management: Dynamic Pooling, Sharing, and Scheduling for AI Workloads in Kubernetes - Wei Chen, China Unicom Cloud Data & Mengxuan Li, Dynamia

Tuesday June 10, 2025 15:30 - 16:00 HKT

Level 19 | Crystal Court I

With the rapid growth of AI applications, optimal GPU utilization is essential, particularly in GPU sharing and job scheduling. Balancing performance, flexibility, and isolation is as challenging as the “Impossible Trinity”. Technologies such as vCUDA, MPS, and MIG are promising attempts, but each has its pros and cons. Managing clusters with multiple sharing techniques adds complexity due to differing resource names and configurations.
In this talk, we will demonstrate how to combine these methods easily. Users specify the memory and core count without managing GPU types or sharing methods. Based on user preferences and GPU resources, the best node and method will be selected. Requests are automatically translated into optimal profiles, and GPUs are dynamically partitioned.
This approach streamlines GPU management, enhances utilization, and improves scheduling. By integrating Volcano and HAMi, the solution strengthens GPU pooling and scheduling, optimizing AI workload management.

Speakers

Mengxuan Li

Software Engineer, Dynamia Inc

Member of volcano community responsible for the development of gpu virtualization mechanism on volcano. It have been merged in the master branch of volcano, and will be released in v1.8. speaker, in OpenAtom Global Open Source Commit#2023

Wei Chen

Technical expert, China Unicom Cloud Data Co., Ltd

I am a technical expert at China Unicom Cloud Data Co., Ltd, specializing in cloud computing infrastructure. I actively contribute to open-source projects, including KubeEdge, Openeular iSula, and Volcano.

Tuesday June 10, 2025 15:30 - 16:00 HKT
Level 19 | Crystal Court I

Content Experience Level Any
Presentation Language Chinese

15:30 HKT

Revolutionizing Sidecarless Service Mesh With eBPF - Zhonghu Xu & Muyang Tian, Huawei

Tuesday June 10, 2025 15:30 - 16:00 HKT

It is widely recognized service meshes sidecar have introduced significant resource overhead, adversely affecting memory and CPU utilization. Farthermore, the tight coupling of sidecars with workloads complicates lifecycle management.

In this session, we will compare pros and cons of the main stream implement: Istio, Ambient and Cilium. But all use a userspace proxy per node, introducing a single point of failure and increasing connection numbers per hop. In this discussion, we aim to demonstrate how eBPF and programmable kernel modules can significantly mitigate these issues.

Lastly, we will introduce several use cases about adopting it to improve micro-service performance while minimizing the interruption on applications during infrastructure upgrades.

Speakers

Muyang Tian

Operating System Engineer, Huawei

Operating system engineer of Huawei Technologies Co., Ltd., core member of Kmesh, contributor of libxdp. Enthusiastic about cloud native technology and eBPF-based high performance network.

Zhonghu Xu

Principal Software Engineer, Huawei

Zhonghu is an Istio Steering Committee member and has been an core maintainer of istio since 2018 and also istio TOP 3 contributors. He is also the CNCF TAG-Network Tech Lead. He is maintainer of many CNCF projects, istio, kmesh and volcano, etc. Also Kubernetes TOP 100 contributors... Read More →

Tuesday June 10, 2025 15:30 - 16:00 HKT
Level 19 | Crystal Court II

Connectivity

Content Experience Level Any
Presentation Language Chinese

15:30 HKT

Simplifying the Networking and Security Stack With Cilium, Hubble, and Tetragon - Liyi Huang, Isovalent at Cisco; Kaixi Fan, Bytedance

Tuesday June 10, 2025 15:30 - 16:00 HKT

Level 21 | Pearl Pavilion

Join us as we celebrate nearly a decade of Cilium, now the de-facto standard CNI for Kubernetes and a cornerstone of cloud native networking, observability, and security. This session provides updates on the latest Cilium release and showcases how its unified eBPF-powered stack is transforming Kubernetes environments by replacing fragmented toolchains with seamless, secure, scalable, and simplified solutions.

Hear about how Cilium is simplifying the cloud native stack and solidifying its role as the comprehensive networking and security solution for modern cloud native architectures from contributors and end users Bytedance and Isovalent.

Speakers

Liyi Huang

customer success architect, Isovalent at Cisco

senior solution architect @isovalent.com

Kaixi Fan

senior linux network engineer, bytedance

Kaixi Fan is a Senior Linux Network Engineer at ByteDance, specializing in cloud computing networks and kernel network protocol stacks. With extensive experience in high-performance networking, he has deep expertise in areas such as eBPF, DPDK, and software-defined networking (SDN... Read More →

Tuesday June 10, 2025 15:30 - 16:00 HKT
Level 21 | Pearl Pavilion

Maintainer Track

15:30 HKT

Open Source Program Office (OSPO) Birds of a Feather

Tuesday June 10, 2025 15:30 - 16:30 HKT

Level 20 | Salon 4

The CNCF, in collaboration with the TODO Group, is excited to host a Birds of a Feather (BoF) session focused on Open Source Program Offices (OSPOs) and their role in supporting engineering and security teams. These interactive roundtables (held in both English and Chinese zh-CN) offer a space to discuss key challenges with mentors knowledgeable in cloud-native and open source operations, including:

Risk assessments for open source dependencies
Security and quality standards in OSS
Barriers to contribution and participation in open source in China

Insights gathered during the session will be featured in a CNCF + TODO Group branded OSPO BoF Report Summary, highlighting the shared perspectives from China’s cloud native ecosystem. Check out our previous summary from KubeCon India.

Who Should Attend?
Engineers using open source in their company, and professionals overseeing open source operations within their organizations, including OSPO managers, CTOs, security teams, or community advocates. This session is a great opportunity to connect, discuss, and share best practices on integrating open source knowledge and cloud native adoption across security, IT, and business teams.

Space is limited, and we encourage early registration to ensure your participation. Fill out this registration form to RSVP and secure your spot. You must be registered to attend KubeCon + CloudNativeCon.

Speakers

Richard Bian

Head of Open Source, Ant Group

Richard Sikang Bian is Head of Open Source at Ant Group. As an engineer by training, Richard was an ex-Square, ex-Microsoft software engineer who had been living in the States for 10+ years. He built Ant Group's first OSPO and has been leading and growing the team from a strategy... Read More →

Hin Yang

VP Linux Foundation APAC, The Linux Foundation

With over 20 years of experience in the software industry, he has held senior management positions at leading global software companies such as Saba, Sumtotal, and Computer Associates. He possesses extensive expertise in enterprise software applications and development, as well as... Read More →

Paco Xu

Lead of open source team, DaoCloud

Xiaoya

Open Source Analyst, Ant Group

Ana Jiménez Santamaría

Project Manager , Linux Foundation, Developer Relations Foundation

Ana is the Project Manager at the Linux foundation TODO Group collaborative project, whose aim is to create and share knowledge on open source management and operations best practices. Formerly she worked at Bitergia, a Software Development Analytics firm, and she has finished her... Read More →

Tuesday June 10, 2025 15:30 - 16:30 HKT
Level 20 | Salon 4

Experiences

15:37 HKT

⚡ Lightning Talk: Advanced GPU-Orchestrated Workflows and HPC Integrations on K8s for Distributed AI/ML at Scale - Brandon Kang, Akamai Technologies

Tuesday June 10, 2025 15:37 - 15:42 HKT

⚡ Lightning Talks, Application Development

As AI/ML workloads continue to scale in complexity, developers and platform engineers are pushing Kubernetes beyond typical MLOps boundaries.

This talk dives into strategies for orchestrating GPU-accelerated training and inference across large-scale clusters -integrating HPC principles, operator-based scheduling, and novel debugging workflows.

Attendees will learn how to implement fine-grained GPU partitioning, harness ephemeral containers to probe and adjust multi-node training in real time, and adopt eBPF-driven instrumentation for low-overhead kernel-level performance insights. We’ll explore cutting-edge scheduling optimizations—like reinforcement-learning approaches and HPC-inspired batch-queuing orchestration on Kubernetes that dynamically respond to heterogeneous job demands.

Real-world case studies will highlight HPC integration scenarios (RDMA, GPU Direct) for data-parallel workloads and complex training frameworks such as Horovod, Ray, and Spark on Kubernetes.

Speakers

Brandon Kang

Principal Technical Solutions Architect, Akamai Technologies

Brandon Kang is a Principal Technical Solutions Architect at Akamai Technologies, specializing in cloud-native projects across Asia as a compute specialist.Before joining Akamai, he served as a Lead Software Engineer at Samsung, a Senior Program Manager at Microsoft, and a Service... Read More →

KubeCon China 2025 (Bandon) pdf

Tuesday June 10, 2025 15:37 - 15:42 HKT
Level 16 | Grand Ballroom I

Content Experience Level Beginner
Presentation Language English

15:44 HKT

⚡ Lightning Talk: AI-Powered Kubernetes Diagnostics With K8sGPT - Kay Yan, DaoCloud

Tuesday June 10, 2025 15:44 - 15:49 HKT

⚡ Lightning Talks, Operations + Performance

In this Lightning Talk, we’ll dive into K8sGPT, a CNCF sandbox project that uses AI to enhance Kubernetes management. K8sGPT leverages LLMs to diagnose cluster issues, offering root cause analysis and solutions in simple terms. It encodes SRE expertise into analyzers, extracting key insights and enriching them with AI-powered explanations.
Key highlights:
- Core Features: Learn to use the CLI and K8sGPT Operator for cluster error analysis and contextualized insights.
- AI Integration & Security: Explore integration with AI models like OpenAI, Azure, and Ollama, with data anonymization for security.
- Real-world Demos: See how K8sGPT simplifies Kubernetes troubleshooting.
- Enterprise Strategies: Discover techniques like LoRA and RAG to tailor K8sGPT for specific environments.
Whether you're new to Kubernetes or an expert, K8sGPT can streamline cluster management, reduce troubleshooting time, and boost efficiency.

Speakers

Kay Yan

Principal Software Engineer, DaoCloud

Kay Yan is kubespray maintainer, containerd/nerdctl maintainer. He is the Principal Software Engineer in DaoCloud, and develop the DaoCloud Enterprise Kubernetes Platform since 2016.

AI Powered Kubernetes Ops with K8sGPT, Kubectl AI pptx

Tuesday June 10, 2025 15:44 - 15:49 HKT
Level 16 | Grand Ballroom I

Content Experience Level Intermediate
Presentation Language English

15:51 HKT

⚡ Lightning Talk: Best Practices for Upgrading Service Mesh Seamlessly - Hang Yin, Alibaba Cloud & Zhencheng Lee, Huawei Technologies

Tuesday June 10, 2025 15:51 - 15:56 HKT

Service Mesh is thriving, with new versions always incorporating exciting features and significant CVE fixes that bring considerable benefits to users. However, the disruption of service traffic caused by Service Mesh upgrades or restarts, leading to system instability, remains a major obstacle to the usage of Service Mesh in production. In the most mature sidecar model, upgrading the data plane of the service mesh results in the redeployment of services; in some cases, this is nearly unacceptable, as certain business applications may face substantial cold start costs . Even for the rising sidecarless mode, it is still necessary to address the issue of interrupting existing user connections, which requires difficult choices. This topic will begin with real-world case studies, where technical experts from Huawei Cloud and Alibaba Cloud will share practical experiences on seamless service mesh upgrades in real production scenarios with the users.

Speakers

Hang Yin

Senior R&D Engineer, Alibaba Cloud

Hang Yin, senior engineer of Alibaba Cloud, focusing on Kubernetes, service mesh and other cloud native fields. Currently served in the Alibaba Cloud Service Mesh (ASM) team, responsible for core abilities of ASM such as performance improvement, ecosystem and Mesh Topology.

Zhencheng Lee

Huawei Cloud Senior R&D engineers, Huawei Technologies Co., Ltd.

Senior Engineer at Huawei Cloud, specializes in Kubernetes, service mesh, and other cloud-native technologies. I am the primary developer and maintainer of the CNCF project Kmesh and actively contribute to several other CNCF projects, with a particular emphasis on service mesh and... Read More →

Tuesday June 10, 2025 15:51 - 15:56 HKT
Level 16 | Grand Ballroom I

⚡ Lightning Talks, Connectivity

Content Experience Level Intermediate
Presentation Language Chinese

15:58 HKT

⚡ Lightning Talk: Deep Dive Into Kernel Requirements: Strengthening Cloud Native With New Kernel Features - Qifeng Guo, DaoCloud

Tuesday June 10, 2025 15:58 - 16:03 HKT

⚡ Lightning Talks, Cloud Native Experience

- Kubernetes 1.31: Moving cgroup v1 Support into Maintenance Mode: making cgroup v2 (kernel 5.8+) a key requirement.
- Linux Kernel Version Requirements shows kernel requirements of Kubernetes features
- eBPF and Modern Networking and observibility

This talk will provide a detailed look at the kernel version requirements for Kubernetes, with a focus on evolving trends in AI infrastructure, SIG-Node, and SIG-Network. We will explore how different kernel versions influence Kubernetes cluster operations, especially in the areas of network performance, resource management, and security enhancements. This session will also highlight some of the rising star projects in the cloud-native ecosystem, including Cilium, Falco, Pyroscope, Kepler and DeepFlow.

Key Topics:
- AI Infrastructure(device related)
- Kubernetes SIG-Node(cgroup)
- Kubernetes SIG-Network(nftables)
- eBPF-based Projects requirements
- Is kernel version checked enough?
- Dependencies/Ecosystem Maintenance

Speakers

Qifeng Guo

Software Engineer, Daocloud

I'm a software developer from DaoCloud, China, and a Kubernetes contributor. Outside work, I'm often active in Kubernetes Networking, including Kube-Proxy, Calico, Cilium, Metallb, and more.

Tuesday June 10, 2025 15:58 - 16:03 HKT
Level 16 | Grand Ballroom I

Content Experience Level Any
Presentation Language Chinese

16:05 HKT

⚡ Lightning Talk: Disaster Recovery - How IaCaC and Kubernetes Enables Cost Efficiency and Fast Recovery - Sandy Wang, KPMG Australia

Tuesday June 10, 2025 16:05 - 16:10 HKT

⚡ Lightning Talks, Operations + Performance

Tech startup in early stage normally aim low running cost on infrastructure spend but fast development and delivery. When there are a first few clients onboard, disaster recovery plan is a must have. When DR is required and an agreed RTO is 6 hours for example, how to not only remain low running cost but also to meet agreed RTO and SLA, our DR plan and implementation is a success to share with the audience. We onboarded container orchestration platform Kubernetes, DevOps best practices, for example Infrastructure-and-Configuration-as-Code and Pipeline-as-Code. Our DR implementation only spends a minimum cost on always-on resources. When a DR incident happens, automated pipelines will bring up on-demand resources that include a Kubernetes cluster, and geo-recover database and storage, then deploy the latest applications into kubernetes cluster, production DR can be live within 2 hours.

Speakers

Pei (Sandy) Wang

Senior DevSecOps Engineer, KPMG Australia

As a Senior DevSecOps Engineer at KPMG Australia, I have been leading the cloud operations and security for Origins, a blockchain-based SaaS solution for supply chain traceability, since May 2022. I have brought the best practices of DevSecOps into day-to-day development and delivery... Read More →

KubeCon 2025 Lightning talk by Sandy Wang pdf

Tuesday June 10, 2025 16:05 - 16:10 HKT
Level 16 | Grand Ballroom I

Content Experience Level Intermediate
Presentation Language English

16:12 HKT

⚡ Lightning Talk: Empowering Sustainable Living With ORES: A Cloud Native Approach To Software-Defined Home Energy Net - Chris Xie, Futurewei & Karl Xiofeng Yang, DEGCent

Tuesday June 10, 2025 16:12 - 16:17 HKT

Discover how the LF Energy working group is driving innovation in sustainable living with the Open Renewable Energy Systems (ORES) project. This session will explore how ORES leverages cloud-native technologies to build an open architecture, open standards, and APIs for software-defined home energy networks. By embracing Kubernetes and other cloud-native principles, ORES enables seamless integration of renewable energy sources, energy storage, and smart devices for a future-proof, scalable, and sustainable energy ecosystem. Learn how ORES promotes collaboration, interoperability, and innovation to shape the next generation of energy solutions in the cloud-native era.

Speakers

Karl Xiofeng Yang

CEO, DEGCent

20+ years' embedded software engineer background.

Chris Xie

Head of Open Source Strategy, Futurewei

Chris Xie, Head of Open Source Strategy at Futurewei, is a prominent advocate for global open source collaboration. With a background that includes roles at both Fortune 500 companies and startups, he brings a unique combination of technical and strategic business expertise. Recently... Read More →

Tuesday June 10, 2025 16:12 - 16:17 HKT
Level 16 | Grand Ballroom I

Content Experience Level Any
Presentation Language English

16:15 HKT

Introducing AIBrix: Cost-Effective and Scalable Kubernetes Control Plane for VLLM - Jiaxin Shan & Liguang Xie, ByteDance

Tuesday June 10, 2025 16:15 - 16:45 HKT

Level 19 | Crystal Court I

Managing large-scale LLM inference workloads on Kubernetes requires more than just high-performance inference engines like vLLM. It demands a comprehensive control plane that integrates deeply with engines while addressing the complexities of large-scale operations. This need inspired the creation of AIBrix, a Kubernetes-native control plane designed to scale LLM inference with modularity, flexibility, and cutting-edge algorithms.

AIBrix introduces a pluggable architecture with components for LLM specific autoscaling, high-density lora management, distributed KV cache, heterogenous serving, model loading etc. AIBrix emphasizes deep co-design with inference engines, enabling advanced features and optimizations. This talk will demonstrate AIBrix in action, showcasing its ability to improve scalability and optimize resource utilization. Additionally, we will present detailed benchmarks to evaluate the performance of these components, providing actionable insights for practitioners.

Speakers

Jiaxin

Software Engineer, Bytedance

Jiaxin works at ByteDance Infrastructure Lab, focusing on serverless and AI infrastructure. He is also a co-chair of Kubernetes WG-Serving, Jiaxin drives innovations and contributes to the future of scalable AI systems.

Liguang Xie .

Director of Engineering, ByteDance

Liguang Xie is an Engineering Lead at ByteDance’s Compute Infrastructure Team, leading next-gen serverless infrastructure design and overseeing open-source, research, and engineering efforts. He has extensive experience in large-scale distributed systems, AI/ML platforms, and LLM/GNN... Read More →

Tuesday June 10, 2025 16:15 - 16:45 HKT
Level 19 | Crystal Court I

Content Experience Level Advanced
Presentation Language English

16:15 HKT

Guardians of the Gateway: Keeping Chaos Out of Your Cloud Highway - Sayan Mondal, Harness & Jintao Zhang, Kong Inc.

Tuesday June 10, 2025 16:15 - 16:45 HKT

Imagine an API gateway standing tall as the guardian of your cloud-native applications - directing traffic, enforcing policies, and ensuring everything runs smoothly. The Kong Gateway Operator orchestrates the control and data planes in Kubernetes, ensuring this process stays on track. But what happens when things start to wobble? A misstep here, a failure there and suddenly, chaos!

In this session, we’ll dive into the twists and turns of API gateway resilience. Think of it as an adventure where the operator faces unexpected disruptions, configuration hiccups, control plane mysteries, and unexpected traffic surges. We’ll explore what happens under the hood, how the gateway responds, and what we can learn from its behavior.

By the end, you’ll walk away with a deeper understanding of how to prepare your gateways for the unexpected and turn "uh-oh" moments into "we've got this" wins.

Speakers

Jintao Zhang

CNCF Ambassador, Kubernetes Ingress-NGINX maintainer, Kong Inc.

Jintao Zhang is a Microsoft MVP, CNCF Ambassador, Apache PMC, and Kubernetes Ingress-NGINX maintainer, he is good at cloud-native technology and Azure technology stack.

Sayan Mondal

Senior Software Engineer II, Harness

Sayan Mondal is a Senior Software Engineer II at Harness, building their Chaos Engineering platform and helping them shape the customer experience market. He's the maintainer of a few open-source libraries and is also a maintainer and community manager of LitmusChaos (the Incubating... Read More →

Tuesday June 10, 2025 16:15 - 16:45 HKT
Level 19 | Crystal Court II

Connectivity

Content Experience Level Any
Presentation Language English

16:15 HKT

OpenTelemetry Project Update - Zihao Rao, Alibaba Cloud; Hui Wang, VictoriaMetrics; Jared Tan, DaoCloud

Tuesday June 10, 2025 16:15 - 16:45 HKT

Level 21 | Pearl Pavilion

OpenTelemetry, one of the most active projects within the CNCF, has become the industry standard for observability. Join us for the official project update session at KubeCon+CloudNativeCon China. In this session, contributors from the OpenTelemetry community will share some of the latest project developments and milestones, including SDK/Instrumentation, profiling, Go compile-time instrumentation injection, and the OpenTelemetry Collector. Don't miss this opportunity to stay informed and contribute to the discussion on the exciting advancements within OpenTelemetry.

Speakers

Jared Tan

Observability Engineer, DaoCloud

Jared Tan is a Sr. Software Engineer at DaoCloud responsible for the Observability Platform, which includes contributions to the OpenTelemetry project and with a passion for observability and helping users start their observability journey. He has participated in several well-known... Read More →

Hui Wang

Software Engineer, VictoriaMetrics

I'm working on monitoring at VictoriaMetrics. My passion is cloud-native technologies and opensource.

Zihao Rao

OpenTelemetry Java Instrumentation Approver, Alibaba Cloud

Zihao is a software engineer at Alibaba Cloud. Over the past few years, he has participated in several well-known open source projects, he is steering committee member of Spring Cloud Alibaba project, and is a triager for OpenTelemetry Java Instrumentation now.

Tuesday June 10, 2025 16:15 - 16:45 HKT
Level 21 | Pearl Pavilion

Maintainer Track

16:19 HKT

⚡ Lightning Talk: Dynamic GPU Fraction and Sharing With Cloud Native Principle - Tiejun Chen, Individual Contributor

Tuesday June 10, 2025 16:19 - 16:24 HKT

As we see, organizations are investing heavily in bringing AI accelerators into their data centers or using them on the public cloud but continue to struggle with the cost-effective and efficient management of these critical resources. There are some existing approaches to address them but heavy and inflexible. Here, we'd like to take this chance to review if-how we can address the challenges of expensive and limited machine learning compute resources like GPU and identifies solutions for GPU fractional optimization with our technical PoC - GPU.x by transparent backend Python hooker within ML upstream frameworks running Kubernetes. It's lightweight, easy and flexible without any code changes to your AI applications towards cloud native.

Speakers

Tiejun Chen

Sr. Technical Lead, Individual Contributor

Tiejun Chen was Sr. technical leader. He ever worked at several tech companies such as VMware, Intel, Wind River Systems and so on, involved in - cloud native, edge computing, ML/AI, WebAssembly, etc. He ever made many presentations at AI.Dev NA 2023, kubecon China 2021 & 2024, Kube... Read More →

Tuesday June 10, 2025 16:19 - 16:24 HKT
Level 16 | Grand Ballroom I

Content Experience Level Intermediate
Presentation Language English

16:26 HKT

⚡ Lightning Talk: Kubernetes Isekai (異世界）：Transforming Kubernetes Education Into a Gamified Adventure - Cyrus Wong & Hongyi Qian, Hong Kong Institute of Information Technology

Tuesday June 10, 2025 16:26 - 16:31 HKT

⚡ Lightning Talks, Cloud Native Novice

Kubernetes Isekai (異世界） is an open-source RPG designed for hands-on Kubernetes learning through gamification. Ideal for junior to Higher Diploma students at Hong Kong Institute of Information Technology (HKIIT), it transforms Kubernetes education into an engaging adventure.

Role-Playing Adventure: Students interact with NPCs who assign Kubernetes tasks.
Task-Based Learning: Tasks involve setting up and managing Kubernetes clusters.
Free Access: Uses AWS Academy Learner Lab with Minikube or Kubernetes.
Scalable Grading: AWS SAM application tests Kubernetes setups within AWS Lambda.
Progress Tracking: Students track progress and earn rewards.
This game offers practical Kubernetes experience in a fun, cost-effective way.
GenAI Chat: Integrates Generative AI to make NPC interactions more dynamic and fun, enhancing the overall learning experience.
Demo
https://www.youtube.com/watch?v=dIwNWwz681k

Speakers

Cyrus Wong

Senior Lecturer, Hong Kong Institute of Information Technology

Cyrus Wong is an accomplished senior lecturer who oversees the Higher Diploma program in Cloud and Data Centre Administration at the Hong Kong Institute of Information Technology (HKIIT) in Hong Kong. He is a passionate advocate for the adoption of cloud technology across various... Read More →

Hongyi Qian

Cloud major student, Hong Kong Institute of Information Technology at IVE(Lee Wai Lee)

I am pursuing a Higher Diploma in Cloud and Data Centre Administration at the Hong Kong Institute of Information Technology at IVE (Lee Wai Lee) and am currently interning at Cathay Pacific Airways. This project teaches Kubernetes concepts and commands in a gamified way. By turning... Read More →

Tuesday June 10, 2025 16:26 - 16:31 HKT
Level 16 | Grand Ballroom I

Content Experience Level Beginner
Presentation Language Chinese

16:33 HKT

⚡ Lightning Talk: Supercharge Agentic AI Apps: A DevEx-Driven Approach To Cloud Native Scaffolding - Daniel Oh, Red Hat

Tuesday June 10, 2025 16:33 - 16:38 HKT

Agentic AI is revolutionizing how we create intelligent agents that can interact with the real world. However, building and deploying these systems often involves significant complexity and time investment. This demo-driven session introduces a cloud-native scaffolding approach, leveraging software templates to streamline and simplify the development of agentic AI projects. This results in a more efficient and developer-friendly experience. Through live demonstrations, attendees will see firsthand how this innovative scaffolding framework accelerates the development lifecycle of agentic AI applications. It provides automated code generation and pre-configured infrastructure. Seamless integration with popular AI libraries reduces overhead and complexity. By the end of the session, participants will have a clear understanding of how to adopt cloud-native scaffolding to revolutionize their development process and gain practical skills to drive innovation in their projects.

Speakers

Daniel Oh

Senior Principal Developer Advocate, Red Hat

Daniel Oh is a Java Champion and Senior Principal Developer Advocate at Red Hat to evangelize developers for building cloud-native apps and serverless ob Kubernetes ecosystems. He's also contributing to various cloud open-source projects and ecosystems as a CNCF ambassador for accelerating... Read More →

Tuesday June 10, 2025 16:33 - 16:38 HKT
Level 16 | Grand Ballroom I

⚡ Lightning Talks, AI + ML

Content Experience Level Any
Presentation Language English

16:40 HKT

⚡ Lightning Talk: Stateful Service Federation in Large-Scale Search, Ads, and Recommendation Scenarios at Xiaohongshu - Yang Song & Vec Sun, Xiaohongshu

Tuesday June 10, 2025 16:40 - 16:45 HKT

⚡ Lightning Talks, Application Development

Search, advertising, and recommendation services are among the primary business types within Xiaohongshu. Due to the strong dependency of these services on index table, each instance replica needs to maintain its own independent state. As a result, such services are deployed using the stateful workload.
With the rapid growth of Xiaohongshu's business scale, the size limit of a single Kubernetes cluster has made it impossible to further scale stateful services. To address daily traffic and business growth, the only solution was to migrate workloads to idle clusters. However, this migration approach has caused significant inconvenience and risks for developer.
To tackle this challenge, Xiaohongshu leveraged Karmada to implement the federation of stateful services. By designing scheduling and deployment capabilities for stateful services on federated clusters, This approach has seamlessly resolved the scaling limitations caused by single-cluster capacity constraints for stateful services.

Speakers

Vec Sun

CloudNative Developer, Xiaohongshu

Sunweixiang has previously worked in the Alibaba Cloud container team as software engineer and is a contributor to the OpenKruise community's main, Karmada, and other communities. He is deeply involved in container application orchestration, multi-cluster.

Yang Song

Software Engineer, xiaohongshu

Song Yang is a Cloud Native Development Engineer at Xiaohongshu, currently working on multi-cluster and Kubernetes scheduler. He is a maintainer of the CNCF incubating project KubeVela.

Stateful Service Federation iat Xiaohongshu pptx

Tuesday June 10, 2025 16:40 - 16:45 HKT
Level 16 | Grand Ballroom I

Content Experience Level Beginner
Presentation Language Chinese

16:47 HKT

⚡ Lightning Talk: Mastering Prefill-Decode-Disaggregated Architecture: Solutions and Best Practices in Alibaba Cloud - Jing Gu & Yang Che, Alibaba Cloud

Tuesday June 10, 2025 16:47 - 17:52 HKT

Mastering Prefill Decode Disaggregated Architecture v3 pdf

Disaggregating the prefill and decoding phases in LLM inference has garnered significant attention in the industry because it can enhance performance. Several solutions have been developed, including Mooncake, TetriInfer, Splitwise, DistServe, and RTP-LLM. However, deploying a disaggregation LLM inference at scale on Kubernetes, while evaluating its performance and cost benefits presents numerous challenges.
In this talk, we will introduce a solution that uses a LeaderWorkerSet as the workload, an Ingress Controller and a node discovery service. It can deploy disaggregated PD on Kubernetes, supporting multiple LLM inference engines like Mooncake and RTP-LLM with zero intrusion. Furthermore, we will discuss improving load balancing using Envoy and ORCA, based on KVCache and metrics, and recommending optimal ratios for the PD phases. Finally, we will cover essential features for production deployment such as high availability, elastic scaling, canary releases, and observability.

Speakers

Yang Che

senior software engineer, Alibaba Cloud

Yang Che, is a senior engineer of Alibaba Cloud. He works in Alibaba cloud container service team, and focuses on Kubernetes and container related product development. Yang also works on building elastic machine learning platform on those technologies. He is an active contributor... Read More →

Jing Gu

Software Engineer, Alibaba Cloud

Jing Gu is a senior engineer at Alibaba Cloud. She works on Alibaba Cloud Container Service for Kubernetes , focusing on serving large language models (LLMs) within Kubernetes and optimizing LLM inference processes.

Tuesday June 10, 2025 16:47 - 17:52 HKT
Level 16 | Grand Ballroom I

⚡ Lightning Talks, AI + ML

Content Experience Level Intermediate
Presentation Language Chinese

16:54 HKT

⚡ Lightning Talk: Kata Confidential Containers Meet Persistent Storage: Overcoming CSI Driver Challenges - Andy Zhang & Archana Choudhary, Microsoft

Tuesday June 10, 2025 16:54 - 16:59 HKT

⚡ Lightning Talks, Data Processing + Storage

Kata Confidential Containers (CoCo) is a technology that provides hardware-based isolation for containerized workloads. It’s built on top of the Kata Containers project, which uses lightweight VMs to provide container isolation. It has the ability to disable file system sharing between host nodes and pods, which helps to reduce attack surfaces. However, such protection ability limits usage of Persistent Volumes. During this session, we will provide an introduction to Kata Confidential Containers and discuss the typical volume mount workflow of CSI drivers. We will cover the challenges that arise when supporting Kata CoCo in CSI drivers. We will explore the solutions we have developed to overcome these challenges and support Kata CoCo in our open source Azure File CSI driver. By the end of this session, you will have a comprehensive understanding of Kata confidential containers and be able to use them with persistent volumes including all the necessary details.

Speakers

Archana Choudhary

Software Engineer, Microsoft

A software engineer who has been exploring cloud-native technologies, particularly focusing on confidential containers over the past several months.

Andy Zhang (OSTC)

Principal Software Engineer, Microsoft

Andy Zhang is the storage lead in Azure Kubernetes Service team at Microsoft, maintainer of multiple Kubernetes projects, including Windows csi-proxy project, Azure CSI drivers, SMB, NFS, iSCSI CSI drivers, etc. Andy focuses on improving the experience of using storage in Kuberne... Read More →

Tuesday June 10, 2025 16:54 - 16:59 HKT
Level 16 | Grand Ballroom I

Content Experience Level Intermediate
Presentation Language English

17:00 HKT

Sponsored Demo: Build Compute Affinity Cloud Native Eco-System

Tuesday June 10, 2025 17:00 - 17:20 HKT

Level 21 | Pearl Pavilion

Computing power requirements is glowing more and more fast. Especially with the development of AI large models (text, images, and videos), the complexity and diversity of computing power are constantly increasing. The traditional computing power management and shedule way no longer meet these challenges. Therefore, computing power supply is shifting from "single and intensive" model to a more flexible and efficient "diverse collaboration" model. So how to integrate computing resources of different architectures and maximize the utilization rate and performance of cluster resources efficiently has become the core challenge faced by enterprises. In this session we will talk about how to build a cloud native system with high efficient computing cluster management capabilities. Projects will be covered like kubernetes, prometheus, volcano, karmada etc.

In order to facilitate networking and business relationships at the event, you may choose to visit a third party’s booth or access sponsored content. You are never required to visit third party booths or to access sponsored content. When visiting a booth or participating in sponsored activities, the third party will receive some of your registration data. This data includes your first name, last name, title, company, address, email, standard demographics questions (i.e. job function, industry), and details about the sponsored content or resources you interacted with. If you choose to interact with a booth or access sponsored content, you are explicitly consenting to receipt and use of such data by the third-party recipients, which will be subject to their own privacy policies.

Speakers

Xiaozhong Yao

Huawei

Tuesday June 10, 2025 17:00 - 17:20 HKT
Level 21 | Pearl Pavilion

17:00 HKT

Portrait Service: AI-Driven PB-Scale Data Mining for Cost Optimization and Stability Enhancement - Yuji Liu & Zhiheng Sun, Kuaishou

Tuesday June 10, 2025 17:00 - 17:30 HKT

Level 19 | Crystal Court I

Kuaishou's Kubernetes-based platform manages 200,000+ machines and 10M+ Pods, generating 10TB+ daily data. AI-driven intelligent portrait service enhances stability and performance:
● Stability Management: AI analyzes system and workload metrics to generate machine health scores, integrated into Kubernetes scheduling to evict/avoid unhealthy nodes. This reduced pod creation delays from 20 to 0.1 cases/day and boosted service availability from 90% to 99.99%.
● Performance Optimization:
Serving 10,000+ services with diverse resource sensitivities (compute-, cache-, and IO-intensive), we combine AI with microarchitecture data to pinpoint bottlenecks and create application profiles. Optimizing resource allocation (compute, cache, memory bandwidth) has increased average IPC by 20% and reduced LLC miss rates for cache-sensitive services from over 50% to 10%.
Future plans include integrating AI Agent technology to automate anomaly detection and reduce manual operations by 80%.

Speakers

Yuji Liu

Software Engineer, Kuaishou Technology

Container cloud engineer from Kuaishou.

Zhiheng Sun

Senior Software Engineer, Kuaishou

I am a cloud-native engineer at kwaishou, specializing in application performance improvement on Kubernetes. I also have led the open-local, a cloud-native local storage project in the open-source community.

KubeCon China 2025 Branded PowerPoint pptx

Tuesday June 10, 2025 17:00 - 17:30 HKT
Level 19 | Crystal Court I

Content Experience Level Beginner
Presentation Language Chinese

17:00 HKT

Unlocking the Power of CEL for Advanced Multi-Cluster Scheduling - Qing Hao & Jian Qiu, Red Hat

Tuesday June 10, 2025 17:00 - 17:30 HKT

KubeCon China 2025 Unlocking the Power of CEL for Advanced Multi Cluster Scheduling .pptx pdf

The Common Expression Language (CEL) is a powerful solution already used in the Kubernetes API, with the recent Kubernetes v1.32 highlighting it for mutating admission policies. It is also used in Envoy and Istio. This topic will explore the benefits and features that CEL can offer for multi-cluster scheduling.

There is a growing demand for granular and customizable requirements in scheduling. For example, users may want to filter clusters with the label "version" > v1.30.0 instead of listing all versions. Many also wish to use their CRD fields or metrics for scheduling. CEL's extensibility effectively addresses these challenges as it can handle complex expressions.

In this talk, we will showcase how Open Cluster Management (OCM) leverages CEL in multi-cluster scheduling. Using the ClusterProfile API as an example, we will demonstrate how CEL meets complex scheduling needs and illustrate its potential to improve GPU utilization for AI applications by solving bin-packing challenges.

Speakers

Jian Qiu

Senior Principal Software Engineer, RedHat

Qiu Jian is a developer at Redhat mainly focusing on multiple cluster management.

Qing Hao

Senior Software Engineer, Red Hat

Qing Hao is a Senior Software Engineer at Red Hat, where she works as the maintainer of Open Cluster Management. She is also the CNCF Ambassador, the speaker at KubeCon China 2024, and the mentor for OSPP 2022 and GSoC 2024. Qing focuses on solving complex challenges... Read More →

Tuesday June 10, 2025 17:00 - 17:30 HKT
Level 19 | Crystal Court II

Emerging + Advanced

Content Experience Level Any
Presentation Language Chinese

17:01 HKT

⚡ Lightning Talk: WASM Vs Docker: Partners, Not Rivals - Pradumna V Saraf, Independent

Tuesday June 10, 2025 17:01 - 17:06 HKT

The rise of WebAssembly (WASM) has sparked comparisons with Docker which often leads to questions and confusion: Are WASM and Docker competing technologies?

In this talk, we will see how this is far from the truth. On one side, Docker revolutionised how we bundle and deploy applications, offering unparalleled portability and simplifying workflows across environments. On the other hand, WASM brings speed, security, and efficiency, enabling the execution of code written in languages like C, C++, and Rust almost at native speed, performance, and rapid startup time even in the browser.

We will explore how these two technologies bring the best of both worlds and help developers achieve portability, efficiency, security, and flexibility. We will also look at how Docker is actively working to make WASM mainstream by allowing WASM container images to be hosted on DockerHub and run WASM containers alongside traditional Linux and Windows containers.

Speakers

Pradumna Saraf

Open Source Developer, Independent

Pradumna is a Developer Advocate, Docker Captain, and a DevOps and Go Developer. He is passionate about Open Source and has mentored hundreds of people to break into the ecosystem. He also creates content on X (formerly Twitter) and LinkedIn, educating others about Open Source and... Read More →

WASM vs Docker Partners, Not Rivals pdf

Tuesday June 10, 2025 17:01 - 17:06 HKT
Level 16 | Grand Ballroom I

Content Experience Level Any
Presentation Language English

17:08 HKT

⚡ Lightning Talk: Scaling AI With Wasm and Edge Computing - Miley Fu, WasmEdge

Tuesday June 10, 2025 17:08 - 17:12 HKT

What does the future of AI look like when we push the boundaries of cloud-based models and take it to the edge? In this talk, we’ll explore how Wasm and edge computing power AI deployment by providing developers with a fast, lightweight, and secure framework for running machine learning models across devices.

We’ll focus on how Wasm enables AI models to run efficiently on edge devices like NVIDIA GPUs, Mac, etc, driving LLM agents that require low latency and high throughput. This session will demonstrate the scalability of Wasm when integrated into distributed systems for AI processing, showing how the combination of edge computing and Wasm allows for faster, responsive AI applications that don’t rely on centralized cloud resources.

We’ll showcase real life use cases such as AI streamers commenting in real time, video translation agents deployment. Developers will walk away with an understanding of how to combine Wasm with edge infra to build and deploy AI apps that scale seamlessly

Speakers

Miley Fu

Founding Member, Second State

Miley is the co-chair and keynote speaker for KubeCon+Open Source Summit and AI Dev 2024. With over 6 years of experience working on WasmEdge runtime in CNCF sandbox as a founding member, she talks at KubeCon, KCD Shenzhen, CloudDay Italy, DevRelCon, Open Source Summit Japan, AWS... Read More →

Tuesday June 10, 2025 17:08 - 17:12 HKT
Level 16 | Grand Ballroom I

Content Experience Level Beginner
Presentation Language English

17:30 HKT

Welcome Reception 🎉

Tuesday June 10, 2025 17:30 - 19:00 HKT

Join us onsite for drinks, appetizers, and conversations with old and new friends in the Solutions Showcase. Explore the exhibit booths to learn more about the latest technologies, meet experts and project maintainers, browse special offers, and much more.

In order to facilitate networking and business relationships at the event, you may choose to visit a third party’s booth or to access sponsored content. You are never required to visit third-party booths or to access sponsored content. When visiting a booth or participating in sponsored activities, the third party will receive some of your registration data. This data includes your first name, last name, title, company, address, email, standard demographics questions (i.e. job function, industry), and details about the sponsored content or resources you interacted with. If you choose to interact with a booth or access sponsored content, you are explicitly consenting to receipt and use of such data by the third-party recipients, which will be subject to their own privacy policies.

Tuesday June 10, 2025 17:30 - 19:00 HKT
Level 16 | Grand Ballroom II

Experiences

07:30 HKT

Badge Pick-Up

Wednesday June 11, 2025 07:30 - 16:30 HKT

Level 16 | Grand Ballroom Pre-Function Area

Wednesday June 11, 2025 07:30 - 16:30 HKT
Level 16 | Grand Ballroom Pre-Function Area

Registration

07:30 HKT

Cloakroom

Wednesday June 11, 2025 07:30 - 17:15 HKT

Level 19 | Crystal Court Foyer

Wednesday June 11, 2025 07:30 - 17:15 HKT
Level 19 | Crystal Court Foyer

Registration, Cloakroom

09:00 HKT

Keynote: Welcome Back + Opening Remarks - Keith Chan, Director of Strategic Planning, The Linux Foundation APAC

Wednesday June 11, 2025 09:00 - 09:10 HKT

Speakers

Keith Chan

Director of Strategic Planning, The Linux Foundation APAC

Wednesday June 11, 2025 09:00 - 09:10 HKT
Level 16 | Grand Ballroom I

Content Experience Level Intermediate
Presentation Language English

09:12 HKT

Keynote: Optimizing AI Workload Scheduling: Bilibili's Journey To an Efficient Cloud Native AI Platform - Long Xu, Bilibili & Kevin Wang, Huawei

Wednesday June 11, 2025 09:12 - 09:22 HKT

As China's leading video platform, Bilibili faces 4 key challenges in multi-cluster AI workloads management:
1. Workload Diversity: Training/inference/video processing workloads have different scheduling requirements.
2. Cross-Cluster Complexity: Managing workloads across multiple Kubernetes clusters in expanding IDCs with SLAs.
3. Performance Demands: Minimal startup latency and best scheduling efficiency for short-running tasks e.g. video processing.
4. Efficiency-QoS Balance: maximizing resource utilization while ensuring priority workload stability.

This talk will share experiences and delve specific optimization techniques:
1. Leveraging and optimizing CNCF projects such as Karmada and Volcano to build a unified, high-performance AI workload scheduling platform.
2. Integrating technologies such as KubeRay to schedule various AI online and offline workloads.
3. Maximizing resource efficiency through online and offline hybrid scheduling, tidal scheduling and other technologies.

Speakers

Kevin Wang

Technical Expert, Lead of Cloud Native Open Source, Huawei

Kevin Wang has been an outstanding contributor in the CNCF community since its beginning and is the leader of the cloud native open source team at Huawei. Kevin has contributed critical enhancements to Kubernetes, led the incubation of the KubeEdge, Volcano, Karmada projects in CNCF... Read More →

Long Xu

Senior Software Engineer, Bilibili

Long Xu is a Senior Software Engineer in the Infrastructure Department at Bilibili. He has rich experiences in the Kubernetes field, including scheduling, autoscaling and system stability.

Wednesday June 11, 2025 09:12 - 09:22 HKT
Level 16 | Grand Ballroom I

Keynote Sessions, AI + ML

Content Experience Level Any
Presentation Language Chinese

09:24 HKT

Keynote: Key Cloud Native Technologies in its Next Decade - Lin Sun, Head of Open Source, Solo.io

Wednesday June 11, 2025 09:24 - 09:34 HKT

When we started CNCF in 2015 to help advance container technology, Kubernetes was the seeding technology to provide a de facto container orchestration platform for all cloud native applications. Almost a decade later, the community has exploded with 200+ open source projects building on top of cloud native technologies. Looking ahead, what challenges will we have in the next decade? What gaps remain for users and contributors? And how do we evolve to meet the demands of an increasingly complex and connected world?

Let us review some of the key CNCF projects today and lay out some possible avenues for where cloud native is going for the next decade, AI, agentic network, sustainability and beyond.

Speakers

Lin Sun

Head of Open Source & CNCF TOC, Solo.io

Lin is the Head of Open Source at Solo.io, and a CNCF TOC member and ambassador. She has worked on the Istio service mesh since the beginning of the project in 2017 and serves on the Istio Steering Committee and Technical Oversight Committee. Previously, she was a Senior Technical... Read More →

Wednesday June 11, 2025 09:24 - 09:34 HKT
Level 16 | Grand Ballroom I

Content Experience Level Any
Presentation Language English

09:36 HKT

Keynote: Who Owns Your Pod? Observing and Blocking Unwanted Behavior at eBay With eBPF - Jianlin Lv, eBay & Liyi Huang, Isovalent at Cisco

Wednesday June 11, 2025 09:36 - 09:46 HKT

Kubernetes admins often struggle to understand pod activities, both for regular pods and those with various privileges. This session explores two use cases that highlight why Tetragon, an eBPF-based observability and enforcement tool, for pod security:
1.Replacing Auditbeat with Tetragon: Learn how Auditbeat rules mapped to Tetragon tracing policies, identifying functionality gaps, and how eBay contributed back to the community
2.Auditing Container Process Permissions: See how Tetragon helped analyze pod behavior and determine if applications could migrate to more restrictive pod security policies, ensuring adherence to the principle of least privilege
We also cover deployment challenges, such as integrating with SIEM platforms, resource utilization, and implementing runtime enforcement for unwanted pod behavior. This talk provides practical insights into using Tetragon for observability, policy refinement, and improving overall pod security posture in Kubernetes environments.

Speakers

Jianlin Lv

Senior Linux Kernel Development Engineer, eBay

https://www.linkedin.com/in/jianlin-lv-25650141/

Liyi Huang

customer success architect, Isovalent at Cisco

senior solution architect @isovalent.com

Wednesday June 11, 2025 09:36 - 09:46 HKT
Level 16 | Grand Ballroom I

Keynote Sessions, Observability

Content Experience Level Intermediate
Presentation Language Chinese

09:48 HKT

Keynote: How We Save $900 per Day with Self-Hosted AI: Building Scalable Local LLM Infrastructure - Vivian Hu, Product Manager, Second State & Lv Yi, CTO, 5miles

Wednesday June 11, 2025 09:48 - 09:58 HKT

While SaaS AI providers like OpenAI offer convenient LLM services, they come with significant drawbacks: high costs, lack of customization, lack of privacy, and usage limitations that can throttle high-volume applications.

This presentation shows how a leading e-commerce web site deployed a highly customized suite of LLM applications on private cloud infra, reducing costs by 90% while maintaining complete control over scalability and quality of service. We'll discuss the technology stack for orchestrating inference workloads on cloud GPUs, and explore practical strategies for building stable, scalable, high-performance AI apps on your own private cloud infra.

Speakers

Lv Yi

CTO, 5miles

Lv Yi is the CTO of 5miles, a leading e-commerce platform in the United States. With 19 years in IT, he is a cloud native enthusiast who previously served as a mobile business expert at AsiaInfo. In 2012, he led Zhangyue's systems evolution toward microservices architecture. At 5miles... Read More →

Vivian Hu

Product Manager, Second State

Vivian Hu is a Product Manager at Second State and a columnist at InfoQ. She is a founding member of the WasmEdge project. She organizes Rust and WebAssembly community events in Asia.

Wednesday June 11, 2025 09:48 - 09:58 HKT
Level 16 | Grand Ballroom I

Presentation Language Chinese

10:00 HKT

Keynote: Building a Large Model Inference Platform for Heterogeneous Chinese Chips Based on VLLM - Haiwen Zhang, China Mobile & Kante Yin, DaoCloud

Wednesday June 11, 2025 10:00 - 10:10 HKT

With the growing demand for heterogeneous computing power, Chinese users are gradually adopting domestic GPUs, especially for inference. vLLM, the most popular open-source inference project, has drawn widespread attention but does not support domestic chips.Chinese inference engines are still developing in functionality, performance, and ecosystem. In this session, we’ll introduce how to adapt vLLM to support domestic GPUs,enabling acceleration features like PageAttention, Continuous Batching, and Chunked Prefill. We’ll also cover performance bottleneck analysis and chip operator development to maximize hardware potential.
Additionally, Kubernetes has become the standard for container orchestration and is the preferred platform for inference services. We’ll show how to deploy the adapted vLLM engine on Kubernetes using the open-source llmaz project with a few lines of code, and explore how llmaz handles heterogeneous GPU scheduling and our practices for monitoring and elastic scaling.

Speakers

Haiwen Zhang

Senior Software Engineer, China Mobile (Suzhou) Software Technology Co., Ltd.

The author has rich experience in cloud-native and AI inference development, currently works at China Mobile, focusing on the research and development of cloud-native and AI inference related products. He shared experiences of service mesh at some technical conferences such as the... Read More →

Kante Yin

Software Engineer, DaoCloud

Wednesday June 11, 2025 10:00 - 10:10 HKT
Level 16 | Grand Ballroom I

Keynote Sessions, AI + ML

Content Experience Level Any
Presentation Language Chinese

10:10 HKT

Keynote: Closing Remarks

Wednesday June 11, 2025 10:10 - 10:15 HKT

Keynote Sessions, Platform Engineering

Wednesday June 11, 2025 10:10 - 10:15 HKT
Level 16 | Grand Ballroom I

Content Experience Level Intermediate
Presentation Language English

10:15 HKT

Gold Sponsor In-Booth Demos

Wednesday June 11, 2025 10:15 - 10:45 HKT

Sponsor: Akamai
Demo: Unleash AI apps with edge-native speed on Akamai Cloud
Booth Number: G5

In order to facilitate networking and business relationships at the event, you may choose to visit a third party’s booth or access sponsored content. You are never required to visit third party booths or to access sponsored content. When visiting a booth or participating in sponsored activities, the third party will receive some of your registration data. This data includes your first name, last name, title, company, address, email, standard demographics questions (i.e. job function, industry), and details about the sponsored content or resources you interacted with. If you choose to interact with a booth or access sponsored content, you are explicitly consenting to receipt and use of such data by the third-party recipients, which will be subject to their own privacy policies.

Wednesday June 11, 2025 10:15 - 10:45 HKT
Level 16 | Grand Ballroom II

Sponsored Demos, Gold Sponsor In-Booth Demos

10:15 HKT

Coffee Break ☕

Wednesday June 11, 2025 10:15 - 11:00 HKT

Wednesday June 11, 2025 10:15 - 11:00 HKT
Level 16 | Grand Ballroom II

Breaks

10:15 HKT

Project Pavilion Tables | Wednesday Morning

Wednesday June 11, 2025 10:15 - 12:30 HKT

Cilium P-2
Hami P-6
Karpenter P-1
Kmesh P-7
Kubespray P-3
Kyverno P-4
Litmus P-5
Open Cluster Management P-8

Wednesday June 11, 2025 10:15 - 12:30 HKT
Level 16 | Grand Ballroom II

Project Opportunities

10:15 HKT

Solutions Showcase

Wednesday June 11, 2025 10:15 - 15:30 HKT

Wednesday June 11, 2025 10:15 - 15:30 HKT
Level 16 | Grand Ballroom II

Solutions Showcase

11:00 HKT

Destroy Your System To Make It More Reliable: An Easy Way To Start Chaos Engineering With LitmusChao - Sayan Mondal, Harness

Wednesday June 11, 2025 11:00 - 11:30 HKT

Level 21 | Pearl Pavilion

Imagine your cloud-native applications as a bustling city. To ensure everything runs smoothly, you need to test its resilience by introducing controlled chaos, like planned roadblocks, to spot and fix weaknesses before they cause real trouble.

Join the LitmusChaos team, the folks behind this CNCF Incubating project, as they share the latest and greatest in chaos engineering. They'll walk you through new features from recent updates, like better resilience testing, improved observability, and scalability tools, all designed to tackle the real-world problems developers and SREs face daily.

You'll also get the inside scoop on the project's growth, how the community is shaping its future, and a sneak peek at what's coming next to make chaos engineering easier and more effective.

Speakers

Sayan Mondal

Senior Software Engineer II, Harness

Wednesday June 11, 2025 11:00 - 11:30 HKT
Level 21 | Pearl Pavilion

Maintainer Track

11:00 HKT

Unified Observability in GRPC: Metrics and Tracing Using OpenTelemetry Plugin - Purnesh Dixit, Google

Wednesday June 11, 2025 11:00 - 11:30 HKT

Unified Observability in GRPC Metrics and Tracing using OpenTelemetry Plugin pdf

gRPC’s performance advantages hinge on minimizing latency, but its binary protocol and streaming capabilities make debugging and monitoring inherently opaque. While distributed tracing identifies bottlenecks, metrics like error rates and throughput are critical for holistic insights. Yet, manual instrumentation for these signals in gRPC is complex, error-prone, and lacks standardization.

In this talk, Purnesh Dixit from the gRPC team unveils the new OpenTelemetry plugin for gRPC, developed by the gRPC team at Google, which provides unified metrics and tracing out-of-the-box to monitor retries, diagnose streaming bottlenecks, and optimize performance without invasive code changes.
1) Client-per-call: Track overall RPC lifecycle (e.g., grpc.client.call.duration).

2) Client-per-call-attempt: Analyze individual retries/hedges (e.g., grpc.client.attempt.duration).

3) Server-instruments: Measure concurrency, request queuing, and stream lifetimes (e.g., grpc.server.call.started).

Speakers

Purnesh Dixit

Purnesh Dixit (gRPC Team, Google), Google

Purnesh is a software engineer on the gRPC team at Google. He is a contributor to the OpenTelemetry and xDS support in gRPC-go.

Wednesday June 11, 2025 11:00 - 11:30 HKT
Level 16 | Grand Ballroom I

Content Experience Level Intermediate
Presentation Language English

11:00 HKT

Resilient Multiregion Global Control Planes With Crossplane and K8gb - Yury Tsarev & Steven Borrelli, Upbound

Wednesday June 11, 2025 11:00 - 11:30 HKT

Level 19 | Crystal Court I

Ensuring resilience in control planes is critical for organizations managing infrastructure and applications across multiple regions with Kubernetes. This talk presents a reference architecture for creating a Crossplane-based Global Control Plane, enhanced with k8gb for DNS-based failover and leveraging an Active/Passive setup.
We’ll explore how Crossplane’s declarative infrastructure provisioning integrates with k8gb to build robust, scalable, and resilient multicluster environments. Key takeaways include:

- Architecting resilient multiregion control planes with Active/Passive roles
- Demonstrating failover mechanisms where the Passive control plane transitions to Active during failures
- Strategies for optimizing failover times while maintaining availability

This session will guide attendees through proven methods and real-world challenges of building resilient Global Control Planes, empowering them to manage critical workloads across geographically distributed regions confidently.

Speakers

Steven Borrelli

Principal Soutions Architect, Upbound

Steven is a Principal Solutions Architect for Upbound, where he helps customers adopt Crossplane.

Yury Tsarev

Principal Solutions Architect, Upbound

Yury is an experienced software engineer who strongly focuses on open-source, software quality and distributed systems. As the creator of k8gb (https://www.k8gb.io) and active contributor to the Crossplane ecosystem, he frequently speaks at conferences covering topics such as Control... Read More →

Wednesday June 11, 2025 11:00 - 11:30 HKT
Level 19 | Crystal Court I

Content Experience Level Intermediate
Presentation Language English

11:00 HKT

Peer Group Mentoring

Wednesday June 11, 2025 11:00 - 12:00 HKT

Level 20 | Salon 4

Peer Group Mentoring allows participants to meet with experienced open source veterans across many CNCF projects. Mentees are paired with 2 – 10 other people in a pod-like setting to explore technical, community, and career questions together.

Sign-up to be a Mentee

Sign-up to be a Mentor

Wednesday June 11, 2025 11:00 - 12:00 HKT
Level 20 | Salon 4

Inclusion + Accessibility

Content Experience Level Any
Presentation Language English

11:45 HKT

How Bloomberg Creates a Resilient Data Analytics Platform Using Karmada - Michas Szacillo & Ilan Filonenko, Bloomberg

Wednesday June 11, 2025 11:45 - 12:15 HKT

Bloomberg’s Data Analytics Platform Engineering team supports a wide-range of real-time streaming, large batch ETL, and data exploration use-cases by using Apache Flink, Apache Spark, and Trino across multi-cluster Kubernetes. However, deploying and managing these workflows at scale efficiently can be challenging due to varying resource requirements and uptime needs. For stateful applications like Apache Flink, ensuring recovery and state conservation after downtime is especially important.

This session will discuss how Bloomberg uses Karmada, a multi-cluster management system, to deploy and manage Apache Flink. We’ll also explore how Karmada’s capabilities can be expanded to handle additional data analytics workloads, including Apache Spark and Trino. The session will cover the unique requirements and real-life use-cases for each, including:

- Resource-aware workload scheduling
- Custom resource requirements and health interpretation
- State conservation during application failover

Speakers

Ilan Filonenko

Engineering Group Lead, Bloomberg

Ilan Filonenko is an Engineering Group Lead focusing on Cloud Native Data Analytics Infrastructure at Bloomberg - where he has designed and implemented distributed systems at both the application and infrastructure level. Previously, Ilan was an engineering consultant and technical... Read More →

Michas Szacillo

Tech Lead, Bloomberg L.P.

Michas is a senior software engineer and tech lead on Bloomberg’s Streaming Analytics engineering team. The platform, which is running on Kubernetes, serves as the foundation for many of Bloomberg's data streaming use cases. Michas is also a frequent collaborator to the CNCF community... Read More →

How Bloomberg Creates a Resilient Data Platform pdf

Wednesday June 11, 2025 11:45 - 12:15 HKT
Level 19 | Crystal Court II

Data Processing + Storage

Content Experience Level Intermediate
Presentation Language English

11:45 HKT

KubeEdge DeepDive: Architecture, Use Cases, and Project Graduation Updates - Yue Bao, Huawei Cloud Computing Technology & Hongbing Zhang, DaoCloud

Wednesday June 11, 2025 11:45 - 12:15 HKT

Level 21 | Pearl Pavilion

In this session, KubeEdge project maintainers will provide an overview of KubeEdge's architecture and its industry-specific use cases. The session will begin with a brief introduction to edge computing and its growing importance in IoT and distributed systems. The maintainers will then delve into the core components and architecture of KubeEdge, demonstrating how it extends Kubernetes' capabilities to manage edge computing workloads efficiently. They will share success stories and insights from organizations that have deployed KubeEdge in various edge environments, such as smart cities, industrial IoT, edge AI, robotics, and retail, highlighting the tangible benefits and transformational possibilities. Additionally, the session will introduce the certified KubeEdge conformance test, hardware test, KubeEdge course and certification, discuss advancements in technology and community governance within the KubeEdge project, and share the latest updates on the project's graduation status.

Speakers

Yue Bao

Senior Software Engineer, Huawei Cloud Computing Technology Co., Ltd.

Hongbing Zhang

KubeEdge TSC Member, Chief Operating Officer, DaoCloud

Hongbing Zhang is Chief Operating Officer of DaoCloud. He is a veteran in open source areas, he founded IBM China Linux team in 2011 and organized team to make significant contributions in Linux Kernel/openstack/hadoop projects. Now he is focusing on cloud native domain and leading... Read More →

Wednesday June 11, 2025 11:45 - 12:15 HKT
Level 21 | Pearl Pavilion

Maintainer Track

11:45 HKT

China Mobile's Panji Platform: Observability Practices and Implementations for LLM Applications Base - Jing Shang, China Mobile & Casey Li, Yunshan Networks, Inc.

Wednesday June 11, 2025 11:45 - 12:15 HKT

中国移动磐基平台 LLM 应用的可观测性实践 pdf

As large language model (LLM) applications are widely deployed, their complex architectures challenge business observability. APM probes, which rely on instrumentation or proxy operation, consume system resources and impact traffic and performance, restricting their use in complex scenarios. Also, multiple teams handling different LLM instances make it hard to coordinate unified observability construction.
To solve this, China Mobile‘'s Panji platform collaborates with DeepFlow to achieve zero-intrusion (Zero Code) and full-stack (Full Stack) observability instantly, using eBPF and Wasm technologies. eBPF collects real-time data at the kernel level, while Wasm plugins parse streaming requests. By integrating existing data, the platform provides service universal map, distributed tracing, and multi-dimensional metric analysis, ensuring the stability and performance optimization of LLM applications.

Speakers

Jing Shang

Chief Expert of China Mobile Group, China Mobile

Dr. Shang Jing, Chief Expert at China Mobile Group, has over 20 years of experience in IT system development, construction, and operation. Specializing in big data and cloud technologies, she led the development of China Mobile's Wutong Big Data Platform. Under her leadership, the... Read More →

Casey Li

Product Manager, Yunshan Networks, Inc.

Starting from graduate school at Huazhong University of Science and Technology in 2013, I joined Tencent Cloud virtual network team in 2016, which provided me with in-depth theoretical knowledge and practical experience in cloud networks. In 2018, I joined YUNSHAN Networks as PM... Read More →

Wednesday June 11, 2025 11:45 - 12:15 HKT
Level 16 | Grand Ballroom I

Content Experience Level Advanced
Presentation Language Chinese

11:45 HKT

Kube Intelligence - A Metric Based Insightful Remediation Recommender - Yash Bhatnagar, Google

Wednesday June 11, 2025 11:45 - 12:15 HKT

Level 19 | Crystal Court I

Not everything can be thought about while designing or developing the applications, and as such lot of the design decisions are based on estimates and potential usage patterns.

More often that not, these estimates differ from reality and introduce inefficiencies in the system across several fronts - and if at all visible, it always much later in the lifecycle when you already have several customers & high footprint.

And hence, unless there is a clear sign of performance degradation or unjustified costs, there is often no incentive to invest time & effort for some unknown gains.

In this session Yash will outline a real world case study about how they went about building an internal platform for handling several aspects of post deployment challenges like

1. rightsizing opportunities,
2. architecture migrations like moving to serverless,
3. finding right maintenance windows, etc

by using a wide range of metrics, and how impactful these minor optimizations turned out to be.

Speakers

Yash Bhatnagar

Software Engineer, Google

Yash is working with Google as Software Engineer, and has 9 years of industrial experience with cloud architectures and micro-service development across Google and VMware. He has been a speaker at several international conferences such as KubeCon + CloudNativeCon and Open Source... Read More →

Wednesday June 11, 2025 11:45 - 12:15 HKT
Level 19 | Crystal Court I

Content Experience Level Any
Presentation Language English

12:15 HKT

Lunch 🍲

Wednesday June 11, 2025 12:15 - 13:45 HKT

Wednesday June 11, 2025 12:15 - 13:45 HKT
Level 16 | Grand Ballroom II

Breaks

12:45 HKT

Project Pavilion Tables | Wednesday Afternoon

Wednesday June 11, 2025 12:45 - 15:30 HKT

Argo P-3
Cilium P-2
Cloud Native Buildpacks P-8
Dragonfly P-7
Fluid P-5
Karpenter P-1
OpenGemini P-6
OpenTelemetry P-4

Wednesday June 11, 2025 12:45 - 15:30 HKT
Level 16 | Grand Ballroom II

Project Opportunities

13:45 HKT

OPEA: The Key Open Platform for Enabling Enterprise AI Deployment - Kenny Chen, Intel Corporation, COIA

Wednesday June 11, 2025 13:45 - 14:15 HKT

In today's tech landscape, AI drives industry transformation, but enterprises face challenges in AI adoption—diverse hardware, complex workflows, data privacy. OPEA, an open-source enterprise AI platform with modular microservices, offers unified solutions for rapid deployment. Through DeepSeek inference appliance case, see how OPEA integrates with IT infrastructure, optimizes performance, and enhances reliability. Discover the new "Powered by OPEA" certification for confident AI deployment.

Speakers

Kenny Chen

Head of China Open Source Software Ecosystem, Deputy Secretary-General of COIA, Intel Corporation, COIA

Wednesday June 11, 2025 13:45 - 14:15 HKT
Level 21 | Emerald Pavilion

Building the Future of Industrial AI with Open Source (By COIA)

13:45 HKT

Solidigm CSAL Solution Brings Advanced IO Shaping, Caching and Data Placement Into NVIDIA DPU DOCA S - Wayne Gao, Solidigm & Long Chen, NVIDIA

Wednesday June 11, 2025 13:45 - 14:15 HKT

Kubecon China 25 Progressive Delivery Made Easy with Argo Rollouts pdf

CSAL is Cloud Storage Acceleration Layer for BigData and AI. it is open-source user mode FTL, cache and io trace component inside SPDK(upstreamed). It commercially helps Alibaba cloud storage system.
refer https://www.solidigm.com/products/technology/cloud-storage-acceleration-layer-write-shaping-csal.html. Alibaba and Solidigm joint top computer conference paper Eurosys2024 https://dl.acm.org/doi/pdf/10.1145/3627703.3629566
Session Topics:
This session is joint development with NVIDIA DPU team and BeeGFS
1. CSAL leverage DPU DRAM as CSAL write buffer who achieve best storage latency ever also promise the data consistency.
2. QLC high density storage is favorable by AI industry since it save power and space for AI Data Center. DPU storage solution can achieve same thing, it is great combine two things together.
3. CSAL bring advanced storage IO shaping, caching and data placement SW into NVIDIA DPU DOCA storage SW service,
4. DPU and CSAL and BeeGFS experiment data sharing and report

Speakers

Long Chen

Director, NVIDIA

Take charge of promoting NVIDIA networking for high speed storage and new application market in China

Wayne Gao

Princinple storage solution architect, Solidigm

Wayne Gao is a Principal Engineer as Storage solution architect and worked on CSAL from PF to Alibaba commercial release. Wayne also takes main developer effort to finish CSAL pmem/DSA and cxl.mem PF from intel to Solidigm. Before joining Intel, Wayne has over 20 years of storage... Read More →

Wednesday June 11, 2025 13:45 - 14:15 HKT
Level 19 | Crystal Court II

Data Processing + Storage

Content Experience Level Intermediate
Presentation Language Chinese

13:45 HKT

The Next Steps for Ingress-NGINX and the Ingate Project - Jintao Zhang, Kong Inc.

Wednesday June 11, 2025 13:45 - 14:15 HKT

Level 21 | Pearl Pavilion

I will share the progress of the Ingress-NGINX project in this topic, as well as our newly incubated project, Ingate. Ingate is a project we created to actively adopt the Gateway API, and we will explore the next steps in the Ingate project based on the successes and failures we've experienced in the Ingress-NGINX project, along with user demands for frequently used features.

Speakers

Jintao Zhang

CNCF Ambassador, Kubernetes Ingress-NGINX maintainer, Kong Inc.

Jintao Zhang is a Microsoft MVP, CNCF Ambassador, Apache PMC, and Kubernetes Ingress-NGINX maintainer, he is good at cloud-native technology and Azure technology stack.

Wednesday June 11, 2025 13:45 - 14:15 HKT
Level 21 | Pearl Pavilion

Maintainer Track

13:45 HKT

Progressive Delivery Made Easy With Argo Rollouts - Kevin Dubois, Red Hat

Wednesday June 11, 2025 13:45 - 14:15 HKT

Level 19 | Crystal Court I

ou might already be using a CI/CD solution, but are you 100% sure things will roll out without a glitch once you go to production? Unfortunately differences between testing/staging and production environments are virtually unavoidable. There’s always a risk for unforeseen issues related to your production environment and/or actual load which can lead to potential disruptions to your users.

Progressive delivery is the next step after Continuous Delivery to roll out your application in a controlled and automated way so you can verify and test your application *in production* before it becomes fully available to all your user bases.

Embrace GitOps and Progressive Delivery with techniques like blue-green, canary release, shadowing traffic, dark launches and automatic metrics-based rollouts to validate the application in production using Kubernetes and tools like Istio, Prometheus, ArgoCD, and Argo Rollouts.

Come to this session to learn about Progressive Delivery in action using Kubernetes.

Speakers

Kevin Dubois

Senior Principal Developer Advocate, Red Hat

Kevin is a Java Champion, software engineer, author and international speaker with a passion for Open Source, Java, and Cloud Native Development & Deployment practices. He currently works as developer advocate at Red Hat where he gets to enjoy working with Open Source projects and... Read More →

Wednesday June 11, 2025 13:45 - 14:15 HKT
Level 19 | Crystal Court I

Content Experience Level Intermediate
Presentation Language English

13:45 HKT

Connecting Dots: Unified Hybrid Multi-Cluster Auth Experience With SPIFFE and Cluster Inventory API - Chen Yu, Microsoft & Jian Zhu, Red Hat

Wednesday June 11, 2025 13:45 - 14:15 HKT

Kubecon China 2025 Unified Hybrid Multi Cluster Auth Experience with SPIFFE and Cluster Inventory API pdf

As the multi-cluster pattern continues to evolve, managing K8s identities, credentials, and permissions for teams and multi-cluster apps, such as Argo and Kueue, has become a hassle, typically involving managing individual service accounts on each cluster and passing credentials around. Such setup is often scattered, repetitive, difficult to track/audit, and may impose security and ops complications. This is especially true with hybrid environments, where different solutions could be in play across platforms.

This demo presents a solution based on OpenID, SPIFFE/SPIRE, and Cluster Inventory API from the Multi-Cluster SIG that provides a unified, seamless, and secure auth experience. Facilitated by CNCF multi-cluster projects, OCM and KubeFleet, attendees could be inspired to leverage open source solutions to eliminate credential sprawl, reduce operational complexity, and enhance security in hybrid cloud environments, when setting up teams/applications to access a multi-cluster setup.

Speakers

Chen Yu

Senior Software Engineer, Microsoft

Chen Yu is a senior software engineer at Microsoft with a keen interest in cloud-native computing. He is currently working on Multi-Cluster Kubernetes and contributing to the Fleet project open-sourced by Azure Kubernetes Service.

Jian Zhu

Senior Software Engineer, RedHat

Zhu Jian is a senior software engineer at RedHat, a speaker at Kubecon China 2024, and a core contributor to the open cluster management project. Jian enjoys solving multi-cluster workload distribution problems and extending OCM with add-ons.

Wednesday June 11, 2025 13:45 - 14:15 HKT
Level 16 | Grand Ballroom I

Security

Content Experience Level Intermediate
Presentation Language Chinese

13:45 HKT

Women's Community Gathering

Wednesday June 11, 2025 13:45 - 14:45 HKT

Level 20 | Salon 5

Strong communities foster a feeling of belonging by providing opportunities for interaction, collaboration, and shared experiences. We hope to do just that with a gathering of attendees who identify as women and non-binary individuals at KubeCon + CloudNativeCon China! Join fellow women community members for networking and connection.

Wednesday June 11, 2025 13:45 - 14:45 HKT
Level 20 | Salon 5

Inclusion + Accessibility

Content Experience Level Any
Presentation Language English

14:30 HKT

Leveraging Multi-Agent Dynamic Programming and Hierarchical Reflection for Next-Generation AI Decision-Making (Co-sight) - ShiQing Jiang, ZTE Corporation

Wednesday June 11, 2025 14:30 - 15:00 HKT

As AI tackles increasingly complex tasks, traditional LLMs show limitations in action decision-making and multi-step reasoning, making autonomous planning and dynamic correction key challenges. ZTE's Co-Sight agent system addresses this with a multi-agent (Plan-Actor) collaborative architecture. Its dual-level design separates planning (task decomposition, path generation) from execution, significantly reducing LLM search space. Dynamic task adjustment is achieved via DAG parallel thinking, dynamic context, guardrails, and hierarchical reflection. Co-Sight has demonstrated excellent performance on the GAIA benchmark, particularly showcasing superior stability in complex Level 2 multi-step tasks.

Speakers

ShiQing Jiang

Senior Software Engineer, ZTE Corporation

Wednesday June 11, 2025 14:30 - 15:00 HKT
Level 21 | Emerald Pavilion

Building the Future of Industrial AI with Open Source (By COIA)

14:30 HKT

Exploring KubeEdge Graduation: Build a Diverse and Collaborative Open Source Community From Scratch - Yue Bao & Fei Xu, Huawei; Hongbing Zhang, DaoCloud; Huan Wei, Hangzhou HarmonyCloud; Benamin Huo, QingCloud

Wednesday June 11, 2025 14:30 - 15:00 HKT

Recently, the health of open-source projects, particularly, vendor diversity and neutrality, has become a key topic of discussion. Many projects have faced challenges due to a lack of vendor diversity, threatening their sustainability. It is increasingly clear that setting up the right governance structure and project team during a project’s growth is critical.
KubeEdge, the industry's first cloud-native open-source edge computing project, has grown from its initial launch in 2018 to achieving CNCF graduation this year. Over the past few years, KubeEdge has evolved from a small project into a diverse, collaborative and multi-vendor open-source community
In this panel, we will discuss the lessons learned from KubeEdge community graduation journey, focusing on key strategies in technical planning, community governance, developer growth, and project maintenance. Join us to explore how to build a multi-vendor and diverse community, and how to expand into different industries.

Speakers

Huan Wei

Senior Technical Director, Hangzhou HarmonyCloud Technologies Co., Ltd

Huan is an open source enthusiast and cloud native technology advocate. He is currently the CNCF ambassador, and TSC member of KubeEdge project. He is serving as experienced technical director for HarmonyCloud.

Fei Xu

Senior software Engineer, Huawei

KubeEdge TSC Member, Senior Software Engineer at Huawei Cloud. Focusing on Cloud Native,Kubernetes, Service Mesh, EdgeComputing, EdgeAI and other fields. Currently maintaining the KubeEdge project which is a CNCF graduated project. And has rich experience in Cloud Native and EdgeComputing... Read More →

Benjamin Huo

KubeSphere founding member, KubeEdge TSC member, Director of Cloud Platform, QingCloud Technologies

Benjamin Huo leads QingCloud Technologies' Architect team and Observability Team. He is the founding member of KubeSphere and the co-author of Fluent Operator, Kube-Events, Notification Manager, OpenFunction, and most recently eBPFConductor. He loves cloud-native technologies especially... Read More →

Yue Bao

Senior Software Engineer, Huawei Cloud Computing Technology Co., Ltd.

Hongbing Zhang

KubeEdge TSC Member, Chief Operating Officer, DaoCloud

Wednesday June 11, 2025 14:30 - 15:00 HKT
Level 19 | Crystal Court II

Cloud Native Experience

Content Experience Level Any
Presentation Language Chinese

14:30 HKT

Building Custom GPU Clusters at Scale: Using Kubespray To Create High-Performance AI Infrastructure - Kay Yan, DaoCloud & Rong Zhang, vivo

Wednesday June 11, 2025 14:30 - 15:00 HKT

Level 21 | Pearl Pavilion

Kubespray, recognized by Kubernetes' SIG Cluster Lifecycle, deploys production-ready Kubernetes clusters on bare metal, enhancing performance for AI applications with robust GPU support. This session covers Kubespray's fundamentals, key features, and updates.

As AI workloads like LLMs grow, scalable GPU clusters are essential. Engineers will share insights from deploying custom GPU clusters at scale with Kubespray, discussing challenges and best practices. Attendees will learn to integrate Kubernetes technologies like LWS, Kueue, Gateway API Inference Extension, DRA, and tensor parallelism to enhance AI workloads like RAG and LoRA, improving resource utilization and performance.

We'll share Kubespray's inventory source code to customize AI clusters and use Kubernetes operators to define infrastructure in private clouds, enabling efficient cluster scaling.

Speakers

Rong Zhang

Senior software Engineer, vivo

Rong is a software engineer at vivo developing platform services on top of Kubernetes, providing containerized infrastructure. Focus on the closed loop system of scheduling、gpu technology、network and cluster management.

Kay Yan

Principal Software Engineer, DaoCloud

Kay Yan is kubespray maintainer, containerd/nerdctl maintainer. He is the Principal Software Engineer in DaoCloud, and develop the DaoCloud Enterprise Kubernetes Platform since 2016.

Building Custom GPU clusters at Scale v2 pptx

Wednesday June 11, 2025 14:30 - 15:00 HKT
Level 21 | Pearl Pavilion

Maintainer Track

14:30 HKT

The Past, the Present, and the Future of Platform Engineering - Mauricio "Salaboy" Salatino, Diagrid & Viktor Farcic, Upbound

Wednesday June 11, 2025 14:30 - 15:00 HKT

Level 19 | Crystal Court I

Do you think platform engineering is too hard? Or is it just a buzzword? Is the CNCF landscape too tricky to visualize? If you’ve been in this industry long enough, you should know that platform engineering has been around for a long time.

Most of us have been trying to build developer platforms for decades, and most of us have failed at that. That begs the questions: “What is different now?” “Why will this time be different?” and “Do we have a chance to succeed?”

We’ll take a look at the past, the present, and the future of platform engineering. We’ll see what we were doing in the past, what we did wrong, and why we failed. Further on, we’ll see what we (the industry as a whole) are doing now and, more importantly, where we might go from here.

Get ready for the hard truths and challenges you will face when trying to build a platform based on Kubernetes. Join us for a pain-infused journey filled with challenges teams will face when building platforms to enable other teams.

Speakers

Viktor Farcic

Viktor Farcic, Upbound

Viktor Farcic is a lead rapscallion at Upbound, a member of the CNCF Ambassadors, Google Developer Experts, CDF Ambassadors, and GitHub Stars groups, and a published author. He is a host of the YouTube channel DevOps Toolkit and a co-host of DevOps Paradox.

Mauricio Salatino

Software Engineer, Diagrid

Mauricio works as an Open Source Software Engineer at @Diagrid, contributing to and driving initiatives for the Dapr OSS project. Mauricio also serves as a Steering Committee member for the Knative Project and Co-Leading the Knative Functions initiative. He published a book titled... Read More →

Wednesday June 11, 2025 14:30 - 15:00 HKT
Level 19 | Crystal Court I

Content Experience Level Intermediate
Presentation Language English

14:30 HKT

Guardians of Multi-Tenancy: Enhanced Authorization To Prevent Lateral Node Escape - Dahu Kuang & Cheng Gao, Alibaba Cloud

Wednesday June 11, 2025 14:30 - 15:00 HKT

Guardians of Multi Tenancy Enhanced Authotization to Prevent Lateral Node Escape pdf

Maximizing security in multi-tenant clusters while maintaining cost-effectiveness is crucial for enterprise OPS. Most enterprise clusters deploy multiple daemonsets, which are attractive targets for attackers seeking to escape and move laterally, ultimately taking over the entire cluster.

The SIG community has introduced several advanced security features recently, such as CRD Field Selectors, Field and Label Selector Authorization, validating admission policy (VAP), and Structured Authorization Config. These allow users to define more flexible authorization configurations, addressing filtering and authorization needs for CRDs, kubelet, and other resources in multi-tenant environments.

We will share the lessons learned from the node escape incidents and demonstrate how to implement these new features and show how to use the Common Expression Language (CEL) to configure customized policies in Authorization Webhook and VAP, resulting more node-specific restrictions within clusters.

Speakers

Dahu Kuang

Senior Engineer, Alibaba Cloud

Dahu Kuang is a Security Tech Lead on the Alibaba Cloud Container Service for Kubernetes (ACK) team, focusing on the design and implementation of container security-related work, especially within the context of secure supply chain.

Cheng Gao

Senior Security Engineer, Alibaba Cloud

Cheng Gao, Senior Security Engineer at Alibaba Cloud, focuses on the Security Development Lifecycle (SDL) for cloud-native applications. With expertise in container services, observability, and Serverless architectures, Cheng has led security assurance for several internal container... Read More →

Wednesday June 11, 2025 14:30 - 15:00 HKT
Level 16 | Grand Ballroom I

Security

Content Experience Level Any
Presentation Language English

15:00 HKT

Coffee Break ☕

Wednesday June 11, 2025 15:00 - 15:30 HKT

Wednesday June 11, 2025 15:00 - 15:30 HKT
Level 16 | Grand Ballroom II

Breaks

15:30 HKT

Heterogeneous Hybrid Distributed Training for Large-Scale Language Models - Yanjun Chen, China Mobile, COIA

Wednesday June 11, 2025 15:30 - 16:00 HKT

With the development of AI technology, the demand for computing power for large model training has accelerated the deployment of AI infrastructure. Data centers often have a "resource wall" problem between AI acceleration hardware of different generations and manufacturers, which caused the incompatibility issue of software and hardware stack. Thus, it’s a big challenge for AI infra operators to maximize resource utilization. This topic focuses on technical solutions for collaborative training using chips of different architectures, sharing the practices on solving key problems such as heterogeneous training task splitting, heterogeneous training performance prediction, and heterogeneous hybrid communication and etc.. The project has been open sourced and will be further improved with better maturity through the community.

Speakers

Yanjun Chen

TOC member at COIA, China Mobile, COIA

Wednesday June 11, 2025 15:30 - 16:00 HKT
Level 21 | Emerald Pavilion

Building the Future of Industrial AI with Open Source (By COIA)

15:30 HKT

Ask the Experts: CNCF CTO and TOC Members Open Q&A - Hosted by Chris Aniszczyk, Lin Sun & Kevin Wang

Wednesday June 11, 2025 15:30 - 16:00 HKT

Level 21 | Pearl Pavilion

Join this interactive session for a brief overview of the Cloud Native Computing Foundation (CNCF) Technical Oversight Committee (TOC), including recent initiatives and opportunities to get involved. Learn how the TOC is helping shape the next decade of cloud native technologies, and how you can get involved. Following the overview, we’ll open the floor to your questions—whether they’re technical, or about building leadership within CNCF.
Initial seeding questions include:

What are some of the latest Cloud Native AI initiatives?
How can we encourage more CNCF and TAG contributions from Asian countries?
What are the possible paths to becoming a CNCF TOC member?

Speakers

Kevin Wang

Technical Expert, Lead of Cloud Native Open Source, Huawei

Lin Sun

Head of Open Source & CNCF TOC, Solo.io

Chris Aniszczyk

CTO, CNCF

Wednesday June 11, 2025 15:30 - 16:00 HKT
Level 21 | Pearl Pavilion

Cloud Native Experience

15:30 HKT

Stability in Large Model Training: Practices in Software and Hardware Fault Self-Healing - Yang Cao, Ant Group

Wednesday June 11, 2025 15:30 - 16:00 HKT

stability in large model training practices in software and hardware fault self healing pdf

Training trillion-parameter AI models requires significant GPU resources, where any idle time leads to increased costs. Maintaining full-speed GPU utilization is crucial, yet hardware and software failures (such as firmware, kernel, or hardware issues) often disrupt large-scale training. For example, LLaMA3 experienced 419 interruptions over 54 days, with 78% due to hardware issues, underscoring the necessity for automated anomaly recovery.
At Ant Group, we will share:
GPU Monitoring: Comprehensive monitoring from hardware to applications to ensure optimal performance.
Self-Healing for Large GPU Clusters: Automated fault isolation, recovery from kernel panics, and node reprovisioning for clusters with 10,000+ GPUs.
Core Service Level Objectives (SLOs): Achieving over 98% GPU availability and more than 90% automatic fault isolation.
Predictive Maintenance: Using failure pattern analysis to reduce downtime and improve reliability.

Speakers

Yang Cao

senior engineer, Ant Group

Yang Cao Senior Engineer, Ant Group Yang Cao is a senior engineer at Ant Group, currently focusing on ensuring the stability of large-scale distributed training on Kubernetes.

Wednesday June 11, 2025 15:30 - 16:00 HKT
Level 19 | Crystal Court II

Cloud Native Experience

Content Experience Level Intermediate
Presentation Language Chinese

15:30 HKT

Policy as Code: Past, Present and Future for Novice - Hoon Jo, Megazone

Wednesday June 11, 2025 15:30 - 16:00 HKT

Policy as Code Past, Present and Future for Novice v1.0.0 pdf

When you're new to Kubernetes, Policy as Code (PaC) can be a very unfamiliar topic. But as you get more familiar with Kubernetes, you'll probably be interested in how you can use it securely, especially since Kubernetes is essentially a declarative system via YAML, so having security also be done in code will help with usability and reducing human error.

In order to make PaC easier to understand, I'll demonstrate the Admission Control part directly in Kubernetes. Until recently, this part was based on webhooks, but since v1.23, the decision to actively embrace the Common Expression Language (CEL) has made it possible to apply it as code directly inside Kubernetes. Validating Admission Policy became GA in v1.30, and Mutating Admission Policy is in Alpha in v1.32.

Based on this outline, I'll talk about how PaC has been applied to Kubernetes in the past, how it works today, and finally, how we can expect it to be integrated into Kubernetes in the future.

See you at the session! 🙂

Speakers

Hoon Jo

Cloud Solutions Architect, Cloud Native Engineer, Megazone

Hoon Jo is Cloud Solutions Architect as well as Cloud Native engineer at Megazone. He has many times of speaker experience for cloud native technologies. And spread out Cloud Native Ubiquitous in the world. He has written several books and latest books is 『CONTAINER INFRASTRUCTURE... Read More →

Wednesday June 11, 2025 15:30 - 16:00 HKT
Level 16 | Grand Ballroom I

Cloud Native Novice

Content Experience Level Beginner
Presentation Language English

15:30 HKT

Composable Platforms: Modular Platform Engineering With Kratix and Backstage - Hossein Salahi, Liquid Reply

Wednesday June 11, 2025 15:30 - 16:00 HKT

Level 19 | Crystal Court I

Constructing and managing platforms for diverse teams and workloads presents a significant challenge in today's cloud-native environment. This session introduces the concept of composable platforms, using modular, reusable components as the foundation for platform engineering. This talk will demonstrate how using Kratix, a workload-centric framework, and Backstage an extensible developer portal enables the creation of self-service platforms that balance standardization with adaptability.

The session will detail platform design for scalability and governance, streamlining developer workflows through Backstage, and using Kratix Promises for varied workload requirements. Attendees will gain practical insights into building scalable and maintainable platforms through real-world examples, architectural patterns, and a live demonstration of a fully integrated Kratix-Backstage deployment.

Speakers

Hossein Salahi

Tech Lead, Liquid Reply

Hossein is an experienced cloud computing professional with nearly a decade of expertise in distributed systems and cloud technologies. He began as a student specializing in cloud automation and progressed to a full-time role focusing on on-premises cloud infrastructure and containers... Read More →

Wednesday June 11, 2025 15:30 - 16:00 HKT
Level 19 | Crystal Court I

KubeCon HK 2025.06 Taming Dependency Chaos for LLM in K8S pdf

Content Experience Level Intermediate
Presentation Language English

16:15 HKT

Taming Dependency Chaos for LLM in K8s - Peter Pan, Neko Ayaka & Kebe Liu, DaoCloud

Wednesday June 11, 2025 16:15 - 16:45 HKT

Level 19 | Crystal Court I

AI developer in K8S: either in Jupyter notebook or LLM serving: Python Dependency is always a headache :
- Prepare a set of base Images? The maintenance amounts & efforts will be a nightmare: Since (1) packages in AI world are rapidly version bumping, (2) diff llm codes require diff packages permutation/combination.
- Leave users to `pip install` by themselves ? The resigned waiting blocks productivity and efficiency. You may agree if you did it.
- If on a GPU Cloud, the pkg preparation time may even cost a lot: you rent a GPU but wasted in waiting pip downloading...
- you may choose to D.I.Y: docker-commit your own base-images, but you have to worry about the Dockerfile, registry and additional cloud cost if you don't have local docker env.

----
So we introduce https://github.com/BaizeAI/dataset.

The solution:
1. A CRD to describe the dependency and env.
2. K8S Job to pre-load the packages.
3. PVC to store and mount
4. `conda` to switch from envs
5. share between namespaces

Speakers

Peter Pan

R&D Engineering VP, Daocloud

Kebe Liu

DaoCloud, Senior software engineer, DaoCloud

AI Infra and Service Mesh Team Lead at DaoCloud. Member of Istio Steering Committee. Creator of open source projects such as Merbridge and kcover.

Neko Ayaka

Senior Software Engineer, DaoCloud

Cloud native developer, AI researcher, Gopher with 5 years of experience in loads of development fields across AI, data science, backend, frontend. Co-founder of https://github.com/nolebase

Wednesday June 11, 2025 16:15 - 16:45 HKT
Level 19 | Crystal Court I

Application Development

Content Experience Level Any
Presentation Language English

16:15 HKT

Apache Gravitno: unified metadata lake for Data and AI - Shaofeng Shi, Datastrato

Wednesday June 11, 2025 16:15 - 16:45 HKT

In the AIera, enterprises need to collect more data to build high-quality AI applications, including structured data (databases, data warehouses, etc.) and unstructured data (data lakes, document libraries, real-time data, etc.). Data integrity and compliance play a key role in building AI applications, which is the value of metadata. Providing AI users with a unified data view so that they can better discover and use multi-source heterogeneous data, including data discovery, data semantics, data lineage, data permissions, etc., and managing the data life cycle in combination with enterprise governance needs to avoid resource waste and security issues, has become a strong need for every enterprise.

Apache Gravitino provides a unified API to access multiple data sources and multiple data storages, supports multiple data engines and machine learning frameworks to access data, and implements unified naming, unified permissions, unified lineage, unified auditing and other functions based on unified metadata, thereby greatly simplifying the data operation and breaking the data silos. At present, it has been adopted by companies such as Xiaomi, Bilibili, Pinterest, and Uber, and has achieved good results. This session will introduce the background, architecture, core functions and use cases of Gravitino.

Speakers

Shaofeng Shi

VP of Engineering, Apache Incubator PMC, Datastrato

Wednesday June 11, 2025 16:15 - 16:45 HKT
Level 21 | Emerald Pavilion

Building the Future of Industrial AI with Open Source (By COIA)

16:15 HKT

High-Performance Cloud Native Traffic Authentication Solutions - Muyang Tian & Zengzeng Yao, Huawei

Wednesday June 11, 2025 16:15 - 16:45 HKT