KubeCon + CloudNativeCon China 2025: Full Schedule

10-11 June
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon China 2025 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Hong Kong Standard Time (UTC+8:00). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis.

arrow_back View All Dates

09:12 HKT

Keynote: Optimizing AI Workload Scheduling: Bilibili's Journey To an Efficient Cloud Native AI Platform - Long Xu, Bilibili & Kevin Wang, Huawei

Wednesday June 11, 2025 09:12 - 09:22 HKT

Level 16 | Grand Ballroom I

As China's leading video platform, Bilibili faces 4 key challenges in multi-cluster AI workloads management:
1. Workload Diversity: Training/inference/video processing workloads have different scheduling requirements.
2. Cross-Cluster Complexity: Managing workloads across multiple Kubernetes clusters in expanding IDCs with SLAs.
3. Performance Demands: Minimal startup latency and best scheduling efficiency for short-running tasks e.g. video processing.
4. Efficiency-QoS Balance: maximizing resource utilization while ensuring priority workload stability.

This talk will share experiences and delve specific optimization techniques:
1. Leveraging and optimizing CNCF projects such as Karmada and Volcano to build a unified, high-performance AI workload scheduling platform.
2. Integrating technologies such as KubeRay to schedule various AI online and offline workloads.
3. Maximizing resource efficiency through online and offline hybrid scheduling, tidal scheduling and other technologies.

Speakers

Kevin Wang

Technical Expert, Lead of Cloud Native Open Source, Huawei

Kevin Wang has been an outstanding contributor in the CNCF community since its beginning and is the leader of the cloud native open source team at Huawei. Kevin has contributed critical enhancements to Kubernetes, led the incubation of the KubeEdge, Volcano, Karmada projects in CNCF... Read More →

Long Xu

Senior Software Engineer, Bilibili

Long Xu is a Senior Software Engineer in the Infrastructure Department at Bilibili. He has rich experiences in the Kubernetes field, including scheduling, autoscaling and system stability.

Wednesday June 11, 2025 09:12 - 09:22 HKT
Level 16 | Grand Ballroom I

Keynote Sessions, AI + ML

Content Experience Level Any
Presentation Language Chinese

09:36 HKT

Keynote: Who Owns Your Pod? Observing and Blocking Unwanted Behavior at eBay With eBPF - Jianlin Lv, eBay & Liyi Huang, Isovalent at Cisco

Wednesday June 11, 2025 09:36 - 09:46 HKT

Level 16 | Grand Ballroom I

Kubernetes admins often struggle to understand pod activities, both for regular pods and those with various privileges. This session explores two use cases that highlight why Tetragon, an eBPF-based observability and enforcement tool, for pod security:
1.Replacing Auditbeat with Tetragon: Learn how Auditbeat rules mapped to Tetragon tracing policies, identifying functionality gaps, and how eBay contributed back to the community
2.Auditing Container Process Permissions: See how Tetragon helped analyze pod behavior and determine if applications could migrate to more restrictive pod security policies, ensuring adherence to the principle of least privilege
We also cover deployment challenges, such as integrating with SIEM platforms, resource utilization, and implementing runtime enforcement for unwanted pod behavior. This talk provides practical insights into using Tetragon for observability, policy refinement, and improving overall pod security posture in Kubernetes environments.

Speakers

Jianlin Lv

Senior Linux Kernel Development Engineer, eBay

https://www.linkedin.com/in/jianlin-lv-25650141/

Liyi Huang

customer success architect, Isovalent at Cisco

senior solution architect @isovalent.com

Wednesday June 11, 2025 09:36 - 09:46 HKT
Level 16 | Grand Ballroom I

Keynote Sessions, Observability

Content Experience Level Intermediate
Presentation Language Chinese

09:48 HKT

Keynote: How We Save $900 per Day with Self-Hosted AI: Building Scalable Local LLM Infrastructure - Vivian Hu, Product Manager, Second State & Lv Yi, CTO, 5miles

Wednesday June 11, 2025 09:48 - 09:58 HKT

Level 16 | Grand Ballroom I

While SaaS AI providers like OpenAI offer convenient LLM services, they come with significant drawbacks: high costs, lack of customization, lack of privacy, and usage limitations that can throttle high-volume applications.

This presentation shows how a leading e-commerce web site deployed a highly customized suite of LLM applications on private cloud infra, reducing costs by 90% while maintaining complete control over scalability and quality of service. We'll discuss the technology stack for orchestrating inference workloads on cloud GPUs, and explore practical strategies for building stable, scalable, high-performance AI apps on your own private cloud infra.

Speakers

Lv Yi

CTO, 5miles

Lv Yi is the CTO of 5miles, a leading e-commerce platform in the United States. With 19 years in IT, he is a cloud native enthusiast who previously served as a mobile business expert at AsiaInfo. In 2012, he led Zhangyue's systems evolution toward microservices architecture. At 5miles... Read More →

Vivian Hu

Product Manager, Second State

Vivian Hu is a Product Manager at Second State and a columnist at InfoQ. She is a founding member of the WasmEdge project. She organizes Rust and WebAssembly community events in Asia.

Wednesday June 11, 2025 09:48 - 09:58 HKT
Level 16 | Grand Ballroom I

Keynote Sessions

Presentation Language Chinese

10:00 HKT

Keynote: Building a Large Model Inference Platform for Heterogeneous Chinese Chips Based on VLLM - Kante Yin, DaoCloud

Wednesday June 11, 2025 10:00 - 10:10 HKT

Level 16 | Grand Ballroom I

With the growing demand for heterogeneous computing power, Chinese users are gradually adopting domestic GPUs, especially for inference. vLLM, the most popular open-source inference project, has drawn widespread attention but does not support domestic chips.Chinese inference engines are still developing in functionality, performance, and ecosystem. In this session, we’ll introduce how to adapt vLLM to support domestic GPUs,enabling acceleration features like PageAttention, Continuous Batching, and Chunked Prefill. We’ll also cover performance bottleneck analysis and chip operator development to maximize hardware potential.
Additionally, Kubernetes has become the standard for container orchestration and is the preferred platform for inference services. We’ll show how to deploy the adapted vLLM engine on Kubernetes using the open-source llmaz project with a few lines of code, and explore how llmaz handles heterogeneous GPU scheduling and our practices for monitoring and elastic scaling.

Speakers

Kante Yin

Software Engineer, DaoCloud

Kante is a senior software engineer and an open source enthusiast from DaoCloud, his work is mostly around scheduling, resource management and LLM inference. He actively contributes to upstream Kubernetes as SIG-Scheduling Maintainer and helps in incubating several projects like Kueue... Read More →

Wednesday June 11, 2025 10:00 - 10:10 HKT
Level 16 | Grand Ballroom I

Keynote Sessions, AI + ML

Content Experience Level Any
Presentation Language Chinese

11:45 HKT

China Mobile's Panji Platform: Observability Practices and Implementations for LLM Applications Base - Jing Shang, China Mobile & Casey Li, Yunshan Networks, Inc.

Wednesday June 11, 2025 11:45 - 12:15 HKT

Level 16 | Grand Ballroom I

As large language model (LLM) applications are widely deployed, their complex architectures challenge business observability. APM probes, which rely on instrumentation or proxy operation, consume system resources and impact traffic and performance, restricting their use in complex scenarios. Also, multiple teams handling different LLM instances make it hard to coordinate unified observability construction.
To solve this, China Mobile‘'s Panji platform collaborates with DeepFlow to achieve zero-intrusion (Zero Code) and full-stack (Full Stack) observability instantly, using eBPF and Wasm technologies. eBPF collects real-time data at the kernel level, while Wasm plugins parse streaming requests. By integrating existing data, the platform provides service universal map, distributed tracing, and multi-dimensional metric analysis, ensuring the stability and performance optimization of LLM applications.

Speakers

Jing Shang

Chief Expert of China Mobile Group, China Mobile

Dr. Shang Jing, Chief Expert at China Mobile Group, has over 20 years of experience in IT system development, construction, and operation. Specializing in big data and cloud technologies, she led the development of China Mobile's Wutong Big Data Platform. Under her leadership, the... Read More →

Casey Li

Product Manager, Yunshan Networks, Inc.

Starting from graduate school at Huazhong University of Science and Technology in 2013, I joined Tencent Cloud virtual network team in 2016, which provided me with in-depth theoretical knowledge and practical experience in cloud networks. In 2018, I joined YUNSHAN Networks as PM... Read More →

中国移动磐基平台 LLM 应用的可观测性实践 pdf

Wednesday June 11, 2025 11:45 - 12:15 HKT
Level 16 | Grand Ballroom I

Observability

Content Experience Level Advanced
Presentation Language Chinese

13:45 HKT

Solidigm CSAL Solution Brings Advanced IO Shaping, Caching and Data Placement Into NVIDIA DPU DOCA S - Wayne Gao, Solidigm & Long Chen, NVIDIA

Wednesday June 11, 2025 13:45 - 14:15 HKT

Level 19 | Crystal Court II

CSAL is Cloud Storage Acceleration Layer for BigData and AI. it is open-source user mode FTL, cache and io trace component inside SPDK(upstreamed). It commercially helps Alibaba cloud storage system.
refer https://www.solidigm.com/products/technology/cloud-storage-acceleration-layer-write-shaping-csal.html. Alibaba and Solidigm joint top computer conference paper Eurosys2024 https://dl.acm.org/doi/pdf/10.1145/3627703.3629566
Session Topics:
This session is joint development with NVIDIA DPU team and BeeGFS
1. CSAL leverage DPU DRAM as CSAL write buffer who achieve best storage latency ever also promise the data consistency.
2. QLC high density storage is favorable by AI industry since it save power and space for AI Data Center. DPU storage solution can achieve same thing, it is great combine two things together.
3. CSAL bring advanced storage IO shaping, caching and data placement SW into NVIDIA DPU DOCA storage SW service,
4. DPU and CSAL and BeeGFS experiment data sharing and report

Speakers

Long Chen

Director, NVIDIA

Take charge of promoting NVIDIA networking for high speed storage and new application market in China

Wayne Gao

Princinple storage solution architect, Solidigm

Wayne Gao is a Principal Engineer as Storage solution architect and worked on CSAL from PF to Alibaba commercial release. Wayne also takes main developer effort to finish CSAL pmem/DSA and cxl.mem PF from intel to Solidigm. Before joining Intel, Wayne has over 20 years of storage... Read More →

Wednesday June 11, 2025 13:45 - 14:15 HKT
Level 19 | Crystal Court II

Data Processing + Storage

Content Experience Level Intermediate
Presentation Language Chinese

13:45 HKT

Connecting Dots: Unified Hybrid Multi-Cluster Auth Experience With SPIFFE and Cluster Inventory API - Chen Yu, Microsoft & Jian Zhu, Red Hat

Wednesday June 11, 2025 13:45 - 14:15 HKT

Level 16 | Grand Ballroom I

As the multi-cluster pattern continues to evolve, managing K8s identities, credentials, and permissions for teams and multi-cluster apps, such as Argo and Kueue, has become a hassle, typically involving managing individual service accounts on each cluster and passing credentials around. Such setup is often scattered, repetitive, difficult to track/audit, and may impose security and ops complications. This is especially true with hybrid environments, where different solutions could be in play across platforms.

This demo presents a solution based on OpenID, SPIFFE/SPIRE, and Cluster Inventory API from the Multi-Cluster SIG that provides a unified, seamless, and secure auth experience. Facilitated by CNCF multi-cluster projects, OCM and KubeFleet, attendees could be inspired to leverage open source solutions to eliminate credential sprawl, reduce operational complexity, and enhance security in hybrid cloud environments, when setting up teams/applications to access a multi-cluster setup.

Speakers

Chen Yu

Senior Software Engineer, Microsoft

Chen Yu is a senior software engineer at Microsoft with a keen interest in cloud-native computing. He is currently working on Multi-Cluster Kubernetes and contributing to the Fleet project open-sourced by Azure Kubernetes Service.

Jian Zhu

Senior Software Engineer, RedHat

Zhu Jian is a senior software engineer at RedHat, a speaker at Kubecon China 2024, and a core contributor to the open cluster management project. Jian enjoys solving multi-cluster workload distribution problems and extending OCM with add-ons.

Kubecon China 2025 Unified Hybrid Multi Cluster Auth Experience with SPIFFE and Cluster Inventory API pdf

Wednesday June 11, 2025 13:45 - 14:15 HKT
Level 16 | Grand Ballroom I

Security

Content Experience Level Intermediate
Presentation Language Chinese

14:30 HKT

Exploring KubeEdge Graduation: Build a Diverse and Collaborative Open Source Community From Scratch - Yue Bao & Fei Xu, Huawei; Hongbing Zhang, DaoCloud; Huan Wei, Hangzhou HarmonyCloud; Benamin Huo, QingCloud

Wednesday June 11, 2025 14:30 - 15:00 HKT

Level 19 | Crystal Court II

Recently, the health of open-source projects, particularly, vendor diversity and neutrality, has become a key topic of discussion. Many projects have faced challenges due to a lack of vendor diversity, threatening their sustainability. It is increasingly clear that setting up the right governance structure and project team during a project’s growth is critical.
KubeEdge, the industry's first cloud-native open-source edge computing project, has grown from its initial launch in 2018 to achieving CNCF graduation this year. Over the past few years, KubeEdge has evolved from a small project into a diverse, collaborative and multi-vendor open-source community
In this panel, we will discuss the lessons learned from KubeEdge community graduation journey, focusing on key strategies in technical planning, community governance, developer growth, and project maintenance. Join us to explore how to build a multi-vendor and diverse community, and how to expand into different industries.

Speakers

Huan Wei

Senior Technical Director, Hangzhou HarmonyCloud Technologies Co., Ltd

Huan is an open source enthusiast and cloud native technology advocate. He is currently the CNCF ambassador, and TSC member of KubeEdge project. He is serving as experienced technical director for HarmonyCloud.

Fei Xu

Senior software Engineer, Huawei

KubeEdge TSC Member, Senior Software Engineer at Huawei Cloud. Focusing on Cloud Native,Kubernetes, Service Mesh, EdgeComputing, EdgeAI and other fields. Currently maintaining the KubeEdge project which is a CNCF graduated project. And has rich experience in Cloud Native and EdgeComputing... Read More →

Benjamin Huo

KubeSphere founding member, KubeEdge TSC member, Director of Cloud Platform, QingCloud Technologies

Benjamin Huo leads QingCloud Technologies' Architect team and Observability Team. He is the founding member of KubeSphere and the co-author of Fluent Operator, Kube-Events, Notification Manager, OpenFunction, and most recently eBPFConductor. He loves cloud-native technologies especially... Read More →

Yue Bao

Senior Software Engineer, Huawei Cloud Computing Technology Co., Ltd.

Yue Bao serves as a software engineer of Huawei Cloud. She is now working 100% on open source, focusing on lightweight edge for KubeEdge. She is the maintainer of KubeEgde and also the tech leader of KubeEdge SIG Release and Node. Before that, Yue worked on Huawei Cloud Intelligent... Read More →

Hongbing Zhang

KubeEdge TSC Member, Chief Operating Officer, DaoCloud

Hongbing Zhang is Chief Operating Officer of DaoCloud. He is a veteran in open source areas, he founded IBM China Linux team in 2011 and organized team to make significant contributions in Linux Kernel/openstack/hadoop projects. Now he is focusing on cloud native domain and leading... Read More →

Wednesday June 11, 2025 14:30 - 15:00 HKT
Level 19 | Crystal Court II

Cloud Native Experience

Content Experience Level Any
Presentation Language Chinese

15:30 HKT

Stability in Large Model Training: Practices in Software and Hardware Fault Self-Healing - Yang Cao, Ant Group

Wednesday June 11, 2025 15:30 - 16:00 HKT

Level 19 | Crystal Court II

Training trillion-parameter AI models requires significant GPU resources, where any idle time leads to increased costs. Maintaining full-speed GPU utilization is crucial, yet hardware and software failures (such as firmware, kernel, or hardware issues) often disrupt large-scale training. For example, LLaMA3 experienced 419 interruptions over 54 days, with 78% due to hardware issues, underscoring the necessity for automated anomaly recovery.
At Ant Group, we will share:
GPU Monitoring: Comprehensive monitoring from hardware to applications to ensure optimal performance.
Self-Healing for Large GPU Clusters: Automated fault isolation, recovery from kernel panics, and node reprovisioning for clusters with 10,000+ GPUs.
Core Service Level Objectives (SLOs): Achieving over 98% GPU availability and more than 90% automatic fault isolation.
Predictive Maintenance: Using failure pattern analysis to reduce downtime and improve reliability.

Speakers

Yang Cao

senior engineer, Ant Group

Yang Cao Senior Engineer, Ant Group Yang Cao is a senior engineer at Ant Group, currently focusing on ensuring the stability of large-scale distributed training on Kubernetes.

stability in large model training practices in software and hardware fault self healing pdf

Wednesday June 11, 2025 15:30 - 16:00 HKT
Level 19 | Crystal Court II

Cloud Native Experience

Content Experience Level Intermediate
Presentation Language Chinese

16:15 HKT

High-Performance Cloud Native Traffic Authentication Solutions - Muyang Tian & Zengzeng Yao, Huawei

Wednesday June 11, 2025 16:15 - 16:45 HKT

Level 19 | Crystal Court II

In the rapidly evolving landscape of cloud computing and microservices architecture, efficiently and securely managing communication between services has become a critical challenge. Traditional methods of network traffic authentication often become a performance bottleneck, especially when handling large-scale data flows. This session introduces an innovative solution — leveraging Linux kernel technology XDP (eXpress Data Path) to achieve efficient traffic authentication for service-to-service communications.

We will delve into how to use XDP for rapid filtering and processing of packets before they enter the system's protocol stack, significantly reducing latency and enhancing overall system throughput. Additionally, we will share practical application experiences from projects such as Kmesh, including but not limited to performance tuning, security considerations, and integration with other network security strategies.

Speakers

Zengzeng Yao

Senior Software Engineer, Huawei

Zengzeng is a senior software engineer from Huawei. and he is also a kmesh maintainer with rich experience on service mesh.

Muyang Tian

Operating System Engineer, Huawei

Operating system engineer of Huawei Technologies Co., Ltd., core member of Kmesh, contributor of libxdp. Enthusiastic about cloud native technology and eBPF-based high performance network.

Wednesday June 11, 2025 16:15 - 16:45 HKT
Level 19 | Crystal Court II

Security

Content Experience Level Any
Presentation Language Chinese