KubeCon + CloudNativeCon China 2025: Full Schedule

10-11 June
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon China 2025 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Hong Kong Standard Time (UTC+8:00). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis.

11:00 HKT

An Alternative Metadata System for Large Kubernetes Clusters - Yingcai Xue & Yixiang Chen, ByteDance

Tuesday June 10, 2025 11:00 - 11:30 HKT

Level 19 | Crystal Court II

For an event-driven distributed system like Kubernetes, where components communicate by synchronizing incremental data through the KubeAPIServer, the metadata system is the most critical component. The ETCD is the only official supported metadata system, but some projects like kine explored alternative metadata storage, but they're either not open-sourced or have performance issues.
This talk covers ByteDance's work on high-performance Kubernetes metadata systems. It summarizes ETCD's production issues, analyzes Kubernetes' metadata storage requirements and introduces how we solve it with kubebrain.
Actual results from large-scale environments (over 20K nodes, 1M pods over years) show that KubeBrain enhances cluster performance and stability.
This talk helps understand the challenges of metadata systems in large-scale clusters and provides insights into an open-source solution that has been practiced in ByteDance's production environment.

Speakers

Yixiang Chen

Software Engineer, ByteDance

Yixiang is a seasoned cloud-native technologist with over 9 years of hands-on experience at ByteDance, where he has been at the forefront of large-scale Kubernetes ecosystem innovations. As a core contributor in cloud-native infrastructure, his expertise spans multiple domains including... Read More →

Yingcai Xue

Software Engineer, ByteDance

- graduated from Zhejiang University with a master degree

Tuesday June 10, 2025 11:00 - 11:30 HKT
Level 19 | Crystal Court II

Operations + Performance

Content Experience Level Advanced
Presentation Language Chinese

11:45 HKT

Building Ultra-Large-Scale Cloud Native Edge Systems Using Chaos Engineering - Yue Bao, Huawei Cloud Computing Technology & Yue Li, DaoCloud

Tuesday June 10, 2025 11:45 - 12:15 HKT

Level 19 | Crystal Court II

Fast growing technologies, such as 5G networks, industrial Internet, and AI, are giving edge computing an important role in driving digital transformation. As each new technology brings benefits, it brings challenges. First, there are massive heterogeneous edge devices and it encompass a broad range of device types. Second, Edge devices are often located in unstable and complex physical and network environments, such as limited bandwidth, high latency, etc. How to overcome these challenges and build a stable, large-scale edge computing platform needs to be resolved.
KubeEdge is an open source edge computing framework that extends the power of kubernetes from central cloud to edge. Now, Kubernetes clusters powered by KubeEdge, can stably support 100,000 edge nodes and manage more than one million pods.
In this session, we will share the Key challenges of manage massive heterogeneous edge nodes and tell how using ChaosMesh to makes KubeEdge more Reliable in large-scale edge nodes.

Speakers

Yue Bao

Senior Software Engineer, Huawei Cloud Computing Technology Co., Ltd.

Yue Bao serves as a software engineer of Huawei Cloud. She is now working 100% on open source, focusing on lightweight edge for KubeEdge. She is the maintainer of KubeEgde and also the tech leader of KubeEdge SIG Release and Node. Before that, Yue worked on Huawei Cloud Intelligent... Read More →

yue li

Software Quality Engineer, DaoCloud

work at DaoCloud as Quality Director, more than 20 years IT industry experience, China Mobile, Siemens, HP, EMC, and startup company. Newcomer in Cloud Native and open source fans. Would like to adopt open source projects to improve enterprise software quality with fast release.

Tuesday June 10, 2025 11:45 - 12:15 HKT
Level 19 | Crystal Court II

Operations + Performance

Content Experience Level Any
Presentation Language Chinese

13:45 HKT

From Bottleneck To Breakthrough: Conquering Applications Startup Peaks in Kubernetes - Hexi Guo, Alibaba Cloud & Rentian Zhou & Zhuoqi Liu, CloudPilot AI

Tuesday June 10, 2025 13:45 - 14:15 HKT

Level 19 | Crystal Court II

A variety of applications in Kubernetes typically require higher memory or compute resources during startup—such as Java, .NET, and Node.js applications, as well as those utilizing large data processing frameworks or machine learning models—due to the need to load substantial dependencies and perform complex initialization tasks. To prevent startup failures from resource contention, these applications typically have their resource requests set based on peak startup demands. However, this often leads to resource waste after startup is complete.
To address this challenge, this session presents a queue-based approach using Karpenter. This method allows applications set resource requests based on typical usage instead of peak startup needs. It temporarily spreads applications across multiple smaller nodes during startup, preventing single-node overload. After startup, it smoothly consolidates them onto fewer but larger nodes to optimize resource usage while maintaining service stability.

Speakers

Zhuoqi Liu

Senior Software Engineer, CloudPilot AI Inc

Hexi Guo

Software Engineer, Alibaba Cloud

Alibaba Cloud technical expert, maintainer of Kubernetes elastic scaling component cluster-autoscaler, initiator of open source elastic component kubernetes-cronhpa-controller, responsible for the design and implementation of elastic solutions for Alibaba Cloud industry customers... Read More →

Rentian Zhou

Software Engineer, CloudPilot AI

Rentian, a Software Engineer at CloudPilot AI, focuses on the Karpenter open-source project, contributing to karpenter-provider-alibabacloud and -aws. He has also contributed to various projects and serves as a Karmada Reviewer, the Member of the Volcano and Hwmaeistor communities... Read More →

Tuesday June 10, 2025 13:45 - 14:15 HKT
Level 19 | Crystal Court II

Operations + Performance

Content Experience Level Any
Presentation Language English

14:30 HKT

Unlocking Kyverno: Mastering Policy Management in Large-Scale Kubernetes Clusters - Di Xu, Xiaohongshu & Xu Liu, RedNote

Tuesday June 10, 2025 14:30 - 15:00 HKT

Level 19 | Crystal Court II

With the growing adoption of Kubernetes, managing configurations and ensuring compliance across extensive clusters becomes increasingly complex. Kyverno, a native Kubernetes policy engine, offers a streamlined solution to these challenges. In this session, we'll explore how adopting Kyverno can enhance efficiency, simplify operations, centralize control, and reduce maintenance in Kubernetes environments. We'll demonstrate how Kyverno empowers organizations to effectively manage policies and tackle the unique challenges of large-scale Kubernetes deployments. Drawing from real-world experiences, we will share valuable lessons and best practices that facilitate seamless policy integration and management. Attendees will gain practical insights and tools to optimize their Kubernetes environments using Kyverno.

Speakers

Di Xu

CNCF Ambassador | Principle Software Engineer, Xiaohongshu

Currently, he works at Xiaohongshu leading a team focused on building a highly reliable and scalable container platform. He is the founder of CNCF Sandbox Project Clusternet. Also, he is a top 50 code contributor in Kubernetes community. He had spoken many times at open source conferences... Read More →

Xu Liu

Senior Software Engineer, Xiaohongshu

Focused on the cloud native field, with extensive experience in managing large-scale Kubernetes clusters, container networking and serivcemesh.

Tuesday June 10, 2025 14:30 - 15:00 HKT
Level 19 | Crystal Court II

Operations + Performance

Content Experience Level Any
Presentation Language Chinese

15:30 HKT

Revolutionizing Sidecarless Service Mesh With eBPF - Zhonghu Xu & Muyang Tian, Huawei

Tuesday June 10, 2025 15:30 - 16:00 HKT

Level 19 | Crystal Court II

It is widely recognized service meshes sidecar have introduced significant resource overhead, adversely affecting memory and CPU utilization. Farthermore, the tight coupling of sidecars with workloads complicates lifecycle management.

In this session, we will compare pros and cons of the main stream implement: Istio, Ambient and Cilium. But all use a userspace proxy per node, introducing a single point of failure and increasing connection numbers per hop. In this discussion, we aim to demonstrate how eBPF and programmable kernel modules can significantly mitigate these issues.

Lastly, we will introduce several use cases about adopting it to improve micro-service performance while minimizing the interruption on applications during infrastructure upgrades.

Speakers

Muyang Tian

Operating System Engineer, Huawei

Operating system engineer of Huawei Technologies Co., Ltd., core member of Kmesh, contributor of libxdp. Enthusiastic about cloud native technology and eBPF-based high performance network.

Zhonghu Xu

Principal Software Engineer, Huawei

Zhonghu is an Istio Steering Committee member and has been an core maintainer of istio since 2018 and also istio TOP 3 contributors. He is also the CNCF TAG-Network Tech Lead. He is maintainer of many CNCF projects, istio, kmesh and volcano, etc. Also Kubernetes TOP 100 contributors... Read More →

Tuesday June 10, 2025 15:30 - 16:00 HKT
Level 19 | Crystal Court II

Connectivity

Content Experience Level Any
Presentation Language Chinese

16:15 HKT

Guardians of the Gateway: Keeping Chaos Out of Your Cloud Highway - Sayan Mondal, Harness & Jintao Zhang, Kong Inc.

Tuesday June 10, 2025 16:15 - 16:45 HKT

Level 19 | Crystal Court II

Imagine an API gateway standing tall as the guardian of your cloud-native applications - directing traffic, enforcing policies, and ensuring everything runs smoothly. The Kong Gateway Operator orchestrates the control and data planes in Kubernetes, ensuring this process stays on track. But what happens when things start to wobble? A misstep here, a failure there and suddenly, chaos!

In this session, we’ll dive into the twists and turns of API gateway resilience. Think of it as an adventure where the operator faces unexpected disruptions, configuration hiccups, control plane mysteries, and unexpected traffic surges. We’ll explore what happens under the hood, how the gateway responds, and what we can learn from its behavior.

By the end, you’ll walk away with a deeper understanding of how to prepare your gateways for the unexpected and turn "uh-oh" moments into "we've got this" wins.

Speakers

Jintao Zhang

CNCF Ambassador, Kubernetes Ingress-NGINX maintainer, Kong Inc.

Jintao Zhang is a Microsoft MVP, CNCF Ambassador, Apache PMC, and Kubernetes Ingress-NGINX maintainer, he is good at cloud-native technology and Azure technology stack.

Sayan Mondal

Senior Software Engineer II, Harness

Sayan Mondal is a Senior Software Engineer II at Harness, building their Chaos Engineering platform and helping them shape the customer experience market. He's the maintainer of a few open-source libraries and is also a maintainer and community manager of LitmusChaos (the Incubating... Read More →

Tuesday June 10, 2025 16:15 - 16:45 HKT
Level 19 | Crystal Court II

Connectivity

Content Experience Level Any
Presentation Language English

17:00 HKT

Unlocking the Power of CEL for Advanced Multi-Cluster Scheduling - Qing Hao & Jian Qiu, Red Hat

Tuesday June 10, 2025 17:00 - 17:30 HKT

Level 19 | Crystal Court II

The Common Expression Language (CEL) is a powerful solution already used in the Kubernetes API, with the recent Kubernetes v1.32 highlighting it for mutating admission policies. It is also used in Envoy and Istio. This topic will explore the benefits and features that CEL can offer for multi-cluster scheduling.

There is a growing demand for granular and customizable requirements in scheduling. For example, users may want to filter clusters with the label "version" > v1.30.0 instead of listing all versions. Many also wish to use their CRD fields or metrics for scheduling. CEL's extensibility effectively addresses these challenges as it can handle complex expressions.

In this talk, we will showcase how Open Cluster Management (OCM) leverages CEL in multi-cluster scheduling. Using the ClusterProfile API as an example, we will demonstrate how CEL meets complex scheduling needs and illustrate its potential to improve GPU utilization for AI applications by solving bin-packing challenges.

Speakers

Jian Qiu

Senior Principal Software Engineer, RedHat

Qiu Jian is a developer at Redhat mainly focusing on multiple cluster management.

Qing Hao

Senior Software Engineer, Red Hat

Qing Hao is a Senior Software Engineer at Red Hat, where she works as the maintainer of Open Cluster Management. She is also the CNCF Ambassador, the speaker at KubeCon China 2024, and the mentor for OSPP 2022 and GSoC 2024. Qing focuses on solving complex challenges... Read More →

Tuesday June 10, 2025 17:00 - 17:30 HKT
Level 19 | Crystal Court II

Emerging + Advanced

Content Experience Level Any
Presentation Language Chinese

11:45 HKT

How Bloomberg Creates a Resilient Data Analytics Platform Using Karmada - Michas Szacillo & Ilan Filonenko, Bloomberg

Wednesday June 11, 2025 11:45 - 12:15 HKT

Level 19 | Crystal Court II

Bloomberg’s Data Analytics Platform Engineering team supports a wide-range of real-time streaming, large batch ETL, and data exploration use-cases by using Apache Flink, Apache Spark, and Trino across multi-cluster Kubernetes. However, deploying and managing these workflows at scale efficiently can be challenging due to varying resource requirements and uptime needs. For stateful applications like Apache Flink, ensuring recovery and state conservation after downtime is especially important.

This session will discuss how Bloomberg uses Karmada, a multi-cluster management system, to deploy and manage Apache Flink. We’ll also explore how Karmada’s capabilities can be expanded to handle additional data analytics workloads, including Apache Spark and Trino. The session will cover the unique requirements and real-life use-cases for each, including:

- Resource-aware workload scheduling
- Custom resource requirements and health interpretation
- State conservation during application failover

Speakers

Ilan Filonenko

Engineering Group Lead, Bloomberg

Ilan Filonenko is an Engineering Group Lead focusing on Cloud Native Data Analytics Infrastructure at Bloomberg - where he has designed and implemented distributed systems at both the application and infrastructure level. Previously, Ilan was an engineering consultant and technical... Read More →

Michas Szacillo

Tech Lead, Bloomberg L.P.

Michas is a senior software engineer and tech lead on Bloomberg’s Streaming Analytics engineering team. The platform, which is running on Kubernetes, serves as the foundation for many of Bloomberg's data streaming use cases. Michas is also a frequent collaborator to the CNCF community... Read More →

Wednesday June 11, 2025 11:45 - 12:15 HKT
Level 19 | Crystal Court II

Data Processing + Storage

Content Experience Level Intermediate
Presentation Language English

13:45 HKT

Solidigm CSAL Solution Brings Advanced IO Shaping, Caching and Data Placement Into NVIDIA DPU DOCA S - Wayne Gao, Solidigm & Long Chen, NVIDIA

Wednesday June 11, 2025 13:45 - 14:15 HKT

Level 19 | Crystal Court II

CSAL is Cloud Storage Acceleration Layer for BigData and AI. it is open-source user mode FTL, cache and io trace component inside SPDK(upstreamed). It commercially helps Alibaba cloud storage system.
refer https://www.solidigm.com/products/technology/cloud-storage-acceleration-layer-write-shaping-csal.html. Alibaba and Solidigm joint top computer conference paper Eurosys2024 https://dl.acm.org/doi/pdf/10.1145/3627703.3629566
Session Topics:
This session is joint development with NVIDIA DPU team and BeeGFS
1. CSAL leverage DPU DRAM as CSAL write buffer who achieve best storage latency ever also promise the data consistency.
2. QLC high density storage is favorable by AI industry since it save power and space for AI Data Center. DPU storage solution can achieve same thing, it is great combine two things together.
3. CSAL bring advanced storage IO shaping, caching and data placement SW into NVIDIA DPU DOCA storage SW service,
4. DPU and CSAL and BeeGFS experiment data sharing and report

Speakers

Long Chen

Director, NVIDIA

Take charge of promoting NVIDIA networking for high speed storage and new application market in China

Wayne Gao

Princinple storage solution architect, Solidigm

Wayne Gao is a Principal Engineer as Storage solution architect and worked on CSAL from PF to Alibaba commercial release. Wayne also takes main developer effort to finish CSAL pmem/DSA and cxl.mem PF from intel to Solidigm. Before joining Intel, Wayne has over 20 years of storage... Read More →

Wednesday June 11, 2025 13:45 - 14:15 HKT
Level 19 | Crystal Court II

Data Processing + Storage

Content Experience Level Intermediate
Presentation Language Chinese

14:30 HKT

Exploring KubeEdge Graduation: Build a Diverse and Collaborative Open Source Community From Scratch - Yue Bao & Fei Xu, Huawei; Hongbing Zhang, DaoCloud; Huan Wei, Hangzhou HarmonyCloud; Benamin Huo, QingCloud

Wednesday June 11, 2025 14:30 - 15:00 HKT

Level 19 | Crystal Court II

Recently, the health of open-source projects, particularly, vendor diversity and neutrality, has become a key topic of discussion. Many projects have faced challenges due to a lack of vendor diversity, threatening their sustainability. It is increasingly clear that setting up the right governance structure and project team during a project’s growth is critical.
KubeEdge, the industry's first cloud-native open-source edge computing project, has grown from its initial launch in 2018 to achieving CNCF graduation this year. Over the past few years, KubeEdge has evolved from a small project into a diverse, collaborative and multi-vendor open-source community
In this panel, we will discuss the lessons learned from KubeEdge community graduation journey, focusing on key strategies in technical planning, community governance, developer growth, and project maintenance. Join us to explore how to build a multi-vendor and diverse community, and how to expand into different industries.

Speakers

Huan Wei

Senior Technical Director, Hangzhou HarmonyCloud Technologies Co., Ltd

Huan is an open source enthusiast and cloud native technology advocate. He is currently the CNCF ambassador, and TSC member of KubeEdge project. He is serving as experienced technical director for HarmonyCloud.

Fei Xu

Senior software Engineer, Huawei

KubeEdge TSC Member, Senior Software Engineer at Huawei Cloud. Focusing on Cloud Native,Kubernetes, Service Mesh, EdgeComputing, EdgeAI and other fields. Currently maintaining the KubeEdge project which is a CNCF graduated project. And has rich experience in Cloud Native and EdgeComputing... Read More →

Benjamin Huo

KubeSphere founding member, KubeEdge TSC member, Director of Cloud Platform, QingCloud Technologies

Benjamin Huo leads QingCloud Technologies' Architect team and Observability Team. He is the founding member of KubeSphere and the co-author of Fluent Operator, Kube-Events, Notification Manager, OpenFunction, and most recently eBPFConductor. He loves cloud-native technologies especially... Read More →

Yue Bao

Senior Software Engineer, Huawei Cloud Computing Technology Co., Ltd.

Hongbing Zhang

KubeEdge TSC Member, Chief Operating Officer, DaoCloud

Hongbing Zhang is Chief Operating Officer of DaoCloud. He is a veteran in open source areas, he founded IBM China Linux team in 2011 and organized team to make significant contributions in Linux Kernel/openstack/hadoop projects. Now he is focusing on cloud native domain and leading... Read More →

Wednesday June 11, 2025 14:30 - 15:00 HKT
Level 19 | Crystal Court II

Cloud Native Experience

Content Experience Level Any
Presentation Language Chinese

15:30 HKT

Stability in Large Model Training: Practices in Software and Hardware Fault Self-Healing - Yang Cao, Ant Group

Wednesday June 11, 2025 15:30 - 16:00 HKT

Level 19 | Crystal Court II

Training trillion-parameter AI models requires significant GPU resources, where any idle time leads to increased costs. Maintaining full-speed GPU utilization is crucial, yet hardware and software failures (such as firmware, kernel, or hardware issues) often disrupt large-scale training. For example, LLaMA3 experienced 419 interruptions over 54 days, with 78% due to hardware issues, underscoring the necessity for automated anomaly recovery.
At Ant Group, we will share:
GPU Monitoring: Comprehensive monitoring from hardware to applications to ensure optimal performance.
Self-Healing for Large GPU Clusters: Automated fault isolation, recovery from kernel panics, and node reprovisioning for clusters with 10,000+ GPUs.
Core Service Level Objectives (SLOs): Achieving over 98% GPU availability and more than 90% automatic fault isolation.
Predictive Maintenance: Using failure pattern analysis to reduce downtime and improve reliability.

Speakers

Yang Cao

senior engineer, Ant Group

Yang Cao Senior Engineer, Ant Group Yang Cao is a senior engineer at Ant Group, currently focusing on ensuring the stability of large-scale distributed training on Kubernetes.

Wednesday June 11, 2025 15:30 - 16:00 HKT
Level 19 | Crystal Court II

Cloud Native Experience

Content Experience Level Intermediate
Presentation Language Chinese

16:15 HKT

High-Performance Cloud Native Traffic Authentication Solutions - Muyang Tian & Zhonghu Xu, Huawei

Wednesday June 11, 2025 16:15 - 16:45 HKT

Level 19 | Crystal Court II

In the rapidly evolving landscape of cloud computing and microservices architecture, efficiently and securely managing communication between services has become a critical challenge. Traditional methods of network traffic authentication often become a performance bottleneck, especially when handling large-scale data flows. This session introduces an innovative solution — leveraging Linux kernel technology XDP (eXpress Data Path) to achieve efficient traffic authentication for service-to-service communications.

We will delve into how to use XDP for rapid filtering and processing of packets before they enter the system's protocol stack, significantly reducing latency and enhancing overall system throughput. Additionally, we will share practical application experiences from projects such as Kmesh, including but not limited to performance tuning, security considerations, and integration with other network security strategies.

Speakers

Muyang Tian

Operating System Engineer, Huawei

Operating system engineer of Huawei Technologies Co., Ltd., core member of Kmesh, contributor of libxdp. Enthusiastic about cloud native technology and eBPF-based high performance network.

Zhonghu Xu

Principal Software Engineer, Huawei

Wednesday June 11, 2025 16:15 - 16:45 HKT
Level 19 | Crystal Court II

Security

Content Experience Level Any
Presentation Language Chinese