Loading…
10-11 June
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon China 2025 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Hong Kong Standard Time (UTC+8:00)To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis. 
Venue: Level 19 | Crystal Court II clear filter
arrow_back View All Dates
Wednesday, June 11
 

11:45 HKT

How Bloomberg Creates a Resilient Data Analytics Platform Using Karmada - Michas Szacillo & Ilan Filonenko, Bloomberg
Wednesday June 11, 2025 11:45 - 12:15 HKT
Bloomberg’s Data Analytics Platform Engineering team supports a wide-range of real-time streaming, large batch ETL, and data exploration use-cases by using Apache Flink, Apache Spark, and Trino across multi-cluster Kubernetes. However, deploying and managing these workflows at scale efficiently can be challenging due to varying resource requirements and uptime needs. For stateful applications like Apache Flink, ensuring recovery and state conservation after downtime is especially important.

This session will discuss how Bloomberg uses Karmada, a multi-cluster management system, to deploy and manage Apache Flink. We’ll also explore how Karmada’s capabilities can be expanded to handle additional data analytics workloads, including Apache Spark and Trino. The session will cover the unique requirements and real-life use-cases for each, including:

- Resource-aware workload scheduling
- Custom resource requirements and health interpretation
- State conservation during application failover
Speakers
avatar for Ilan Filonenko

Ilan Filonenko

Engineering Group Lead, Bloomberg
Ilan Filonenko is an Engineering Group Lead focusing on Cloud Native Data Analytics Infrastructure at Bloomberg - where he has designed and implemented distributed systems at both the application and infrastructure level. Previously, Ilan was an engineering consultant and technical... Read More →
avatar for Michas Szacillo

Michas Szacillo

Tech Lead, Bloomberg L.P.
Michas is a senior software engineer and tech lead on Bloomberg’s Streaming Analytics engineering team. The platform, which is running on Kubernetes, serves as the foundation for many of Bloomberg's data streaming use cases. Michas is also a frequent collaborator to the CNCF community... Read More →
Wednesday June 11, 2025 11:45 - 12:15 HKT
Level 19 | Crystal Court II
  Data Processing + Storage

13:45 HKT

Solidigm CSAL Solution Brings Advanced IO Shaping, Caching and Data Placement Into NVIDIA DPU DOCA S - Wayne Gao, Solidigm & Long Chen, NVIDIA
Wednesday June 11, 2025 13:45 - 14:15 HKT
CSAL is Cloud Storage Acceleration Layer for BigData and AI. it is open-source user mode FTL, cache and io trace component inside SPDK(upstreamed). It commercially helps Alibaba cloud storage system.
refer https://www.solidigm.com/products/technology/cloud-storage-acceleration-layer-write-shaping-csal.html. Alibaba and Solidigm joint top computer conference paper Eurosys2024 https://dl.acm.org/doi/pdf/10.1145/3627703.3629566
Session Topics:
This session is joint development with NVIDIA DPU team and BeeGFS
1. CSAL leverage DPU DRAM as CSAL write buffer who achieve best storage latency ever also promise the data consistency.
2. QLC high density storage is favorable by AI industry since it save power and space for AI Data Center. DPU storage solution can achieve same thing, it is great combine two things together.
3. CSAL bring advanced storage IO shaping, caching and data placement SW into NVIDIA DPU DOCA storage SW service,
4. DPU and CSAL and BeeGFS experiment data sharing and report
Speakers
avatar for Long Chen

Long Chen

Director, NVIDIA
Take charge of promoting NVIDIA networking for high speed storage and new application market in China
avatar for Wayne Gao

Wayne Gao

Princinple storage solution architect, Solidigm
Wayne Gao is a Principal Engineer as Storage solution architect and worked on CSAL from PF to Alibaba commercial release. Wayne also takes main developer effort to finish CSAL pmem/DSA and cxl.mem PF from intel to Solidigm. Before joining Intel, Wayne has over 20 years of storage... Read More →
Wednesday June 11, 2025 13:45 - 14:15 HKT
Level 19 | Crystal Court II
  Data Processing + Storage

14:30 HKT

Exploring KubeEdge Graduation: Build a Diverse and Collaborative Open Source Community From Scratch - Yue Bao & Fei Xu, Huawei; Hongbing Zhang, DaoCloud; Huan Wei, Hangzhou HarmonyCloud; Benamin Huo, QingCloud
Wednesday June 11, 2025 14:30 - 15:00 HKT
Recently, the health of open-source projects, particularly, vendor diversity and neutrality, has become a key topic of discussion. Many projects have faced challenges due to a lack of vendor diversity, threatening their sustainability. It is increasingly clear that setting up the right governance structure and project team during a project’s growth is critical.
KubeEdge, the industry's first cloud-native open-source edge computing project, has grown from its initial launch in 2018 to achieving CNCF graduation this year. Over the past few years, KubeEdge has evolved from a small project into a diverse, collaborative and multi-vendor open-source community
In this panel, we will discuss the lessons learned from KubeEdge community graduation journey, focusing on key strategies in technical planning, community governance, developer growth, and project maintenance. Join us to explore how to build a multi-vendor and diverse community, and how to expand into different industries.
Speakers
avatar for Huan Wei

Huan Wei

Senior Technical Director, Hangzhou HarmonyCloud Technologies Co., Ltd
Huan is an open source enthusiast and cloud native technology advocate. He is currently the CNCF ambassador, and TSC member of KubeEdge project. He is serving as experienced technical director for HarmonyCloud.
avatar for Fei Xu

Fei Xu

Senior software Engineer, Huawei
KubeEdge TSC Member, Senior Software Engineer at Huawei Cloud. Focusing on Cloud Native,Kubernetes, Service Mesh, EdgeComputing, EdgeAI and other fields. Currently maintaining the KubeEdge project which is a CNCF graduated project. And has rich experience in Cloud Native and EdgeComputing... Read More →
avatar for Benjamin Huo

Benjamin Huo

KubeSphere founding member, KubeEdge TSC member, Director of Cloud Platform, QingCloud Technologies
Benjamin Huo leads QingCloud Technologies' Architect team and Observability Team. He is the founding member of KubeSphere and the co-author of Fluent Operator, Kube-Events, Notification Manager, OpenFunction, and most recently eBPFConductor. He loves cloud-native technologies especially... Read More →
avatar for Yue Bao

Yue Bao

Senior Software Engineer, Huawei Cloud Computing Technology Co., Ltd.
Yue Bao serves as a software engineer of Huawei Cloud. She is now working 100% on open source, focusing on lightweight edge for KubeEdge. She is the maintainer of KubeEgde and also the tech leader of KubeEdge SIG Release and Node. Before that, Yue worked on Huawei Cloud Intelligent... Read More →
avatar for Hongbing Zhang

Hongbing Zhang

KubeEdge TSC Member, Chief Operating Officer, DaoCloud
Hongbing Zhang is Chief Operating Officer of DaoCloud. He is a veteran in open source areas, he founded IBM China Linux team in 2011 and organized team to make significant contributions in Linux Kernel/openstack/hadoop projects. Now he is focusing on cloud native domain and leading... Read More →
Wednesday June 11, 2025 14:30 - 15:00 HKT
Level 19 | Crystal Court II
  Cloud Native Experience
  • Content Experience Level Any
  • Presentation Language Chinese

15:30 HKT

Stability in Large Model Training: Practices in Software and Hardware Fault Self-Healing - Yang Cao, Ant Group
Wednesday June 11, 2025 15:30 - 16:00 HKT
Training trillion-parameter AI models requires significant GPU resources, where any idle time leads to increased costs. Maintaining full-speed GPU utilization is crucial, yet hardware and software failures (such as firmware, kernel, or hardware issues) often disrupt large-scale training. For example, LLaMA3 experienced 419 interruptions over 54 days, with 78% due to hardware issues, underscoring the necessity for automated anomaly recovery.
At Ant Group, we will share:
GPU Monitoring: Comprehensive monitoring from hardware to applications to ensure optimal performance.
Self-Healing for Large GPU Clusters: Automated fault isolation, recovery from kernel panics, and node reprovisioning for clusters with 10,000+ GPUs.
Core Service Level Objectives (SLOs): Achieving over 98% GPU availability and more than 90% automatic fault isolation.
Predictive Maintenance: Using failure pattern analysis to reduce downtime and improve reliability.
Speakers
avatar for Yang Cao

Yang Cao

senior engineer, Ant Group
Yang Cao Senior Engineer, Ant Group Yang Cao is a senior engineer at Ant Group, currently focusing on ensuring the stability of large-scale distributed training on Kubernetes.
Wednesday June 11, 2025 15:30 - 16:00 HKT
Level 19 | Crystal Court II
  Cloud Native Experience

16:15 HKT

High-Performance Cloud Native Traffic Authentication Solutions - Muyang Tian & Zengzeng Yao, Huawei
Wednesday June 11, 2025 16:15 - 16:45 HKT
In the rapidly evolving landscape of cloud computing and microservices architecture, efficiently and securely managing communication between services has become a critical challenge. Traditional methods of network traffic authentication often become a performance bottleneck, especially when handling large-scale data flows. This session introduces an innovative solution — leveraging Linux kernel technology XDP (eXpress Data Path) to achieve efficient traffic authentication for service-to-service communications.

We will delve into how to use XDP for rapid filtering and processing of packets before they enter the system's protocol stack, significantly reducing latency and enhancing overall system throughput. Additionally, we will share practical application experiences from projects such as Kmesh, including but not limited to performance tuning, security considerations, and integration with other network security strategies.
Speakers
ZY

Zengzeng Yao

Senior Software Engineer, Huawei
Zengzeng is a senior software engineer from Huawei. and he is also a kmesh maintainer with rich experience on service mesh.
MT

Muyang Tian

Operating System Engineer, Huawei
Operating system engineer of Huawei Technologies Co., Ltd., core member of Kmesh, contributor of libxdp. Enthusiastic about cloud native technology and eBPF-based high performance network.
Wednesday June 11, 2025 16:15 - 16:45 HKT
Level 19 | Crystal Court II
  Security
  • Content Experience Level Any
  • Presentation Language Chinese
 
Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.
Filtered by Date -