Loading…
10-11 June
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon China 2025 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Hong Kong Standard Time (UTC+8:00)To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis. 
Company: Advanced clear filter
Tuesday, June 10
 

11:00 HKT

An Alternative Metadata System for Large Kubernetes Clusters - Yingcai Xue & Yixiang Chen, ByteDance
Tuesday June 10, 2025 11:00 - 11:30 HKT
For an event-driven distributed system like Kubernetes, where components communicate by synchronizing incremental data through the KubeAPIServer, the metadata system is the most critical component. The ETCD is the only official supported metadata system, but some projects like kine explored alternative metadata storage, but they're either not open-sourced or have performance issues.
This talk covers ByteDance's work on high-performance Kubernetes metadata systems. It summarizes ETCD's production issues, analyzes Kubernetes' metadata storage requirements and introduces how we solve it with kubebrain.
Actual results from large-scale environments (over 20K nodes, 1M pods over years) show that KubeBrain enhances cluster performance and stability.
This talk helps understand the challenges of metadata systems in large-scale clusters and provides insights into an open-source solution that has been practiced in ByteDance's production environment.
Speakers
avatar for Yixiang Chen

Yixiang Chen

Software Engineer, ByteDance
Yixiang is a seasoned cloud-native technologist with over 9 years of hands-on experience at ByteDance, where he has been at the forefront of large-scale Kubernetes ecosystem innovations. As a core contributor in cloud-native infrastructure, his expertise spans multiple domains including... Read More →
avatar for Yingcai Xue

Yingcai Xue

Software Engineer, ByteDance
- graduated from Zhejiang University with a master degree
Tuesday June 10, 2025 11:00 - 11:30 HKT
Level 19 | Crystal Court II
  Operations + Performance

14:30 HKT

Advancing Observability With Compile-Time Auto-Instrumentation in Golang - Liu Ziming, Alibaba Cloud & Przemek Delewski, Quesma
Tuesday June 10, 2025 14:30 - 15:00 HKT
Observability for cloud-native software applications requires efficient and reliable methods to gain insights into distributed systems. This talk will explore various instrumentation approaches for Golang, focusing on the concept of compile-time auto-instrumentation with OpenTelemetry. We will unveil implementation details of compile-time auto-instrumentation, highlighting the revolutionary features including flexible custom plugin capabilities, enhanced context propagation, trace-log correlation, and etc. The talk will cover examples of using compile-time auto instrumentation, lessons learned from the practice and scenarios that benefit from such an implementation. The audience will take away a solid understanding of how compile-time auto instrumentation works and why it presents an efficient and more performant solution for achieving observability.
Speakers
avatar for Przemek Delewski

Przemek Delewski

Principal Architect, Quesma
Przemek is a founding engineer at Quesma, working in the data transformation space and responsible for architectural direction. An observability veteran with over 15 years of experience at Dynatrace and Sumo Logic. OpenTelemetry Maintainer. Designs programming languages for fun
avatar for Liu Ziming

Liu Ziming

Engineer, Alibaba Cloud
Alibaba R&D Engineer
Tuesday June 10, 2025 14:30 - 15:00 HKT
Level 16 | Grand Ballroom I
  Observability

16:15 HKT

Introducing AIBrix: Cost-Effective and Scalable Kubernetes Control Plane for VLLM - Jiaxin Shan & Liguang Xie, ByteDance
Tuesday June 10, 2025 16:15 - 16:45 HKT
Managing large-scale LLM inference workloads on Kubernetes requires more than just high-performance inference engines like vLLM. It demands a comprehensive control plane that integrates deeply with engines while addressing the complexities of large-scale operations. This need inspired the creation of AIBrix, a Kubernetes-native control plane designed to scale LLM inference with modularity, flexibility, and cutting-edge algorithms.

AIBrix introduces a pluggable architecture with components for LLM specific autoscaling, high-density lora management, distributed KV cache, heterogenous serving, model loading etc. AIBrix emphasizes deep co-design with inference engines, enabling advanced features and optimizations. This talk will demonstrate AIBrix in action, showcasing its ability to improve scalability and optimize resource utilization. Additionally, we will present detailed benchmarks to evaluate the performance of these components, providing actionable insights for practitioners.
Speakers
avatar for Jiaxin

Jiaxin

Software Engineer, Bytedance
Jiaxin works at ByteDance Infrastructure Lab, focusing on serverless and AI infrastructure. He is also a co-chair of Kubernetes WG-Serving, Jiaxin drives innovations and contributes to the future of scalable AI systems.
avatar for Liguang Xie .

Liguang Xie .

Director of Engineering, ByteDance
Liguang Xie is an Engineering Lead at ByteDance’s Compute Infrastructure Team, leading next-gen serverless infrastructure design and overseeing open-source, research, and engineering efforts. He has extensive experience in large-scale distributed systems, AI/ML platforms, and LLM/GNN... Read More →
Tuesday June 10, 2025 16:15 - 16:45 HKT
Level 19 | Crystal Court I
  AI + ML
 
Wednesday, June 11
 

11:45 HKT

China Mobile's Panji Platform: Observability Practices and Implementations for LLM Applications Base - Jing Shang, China Mobile & Casey Li, Yunshan Networks, Inc.
Wednesday June 11, 2025 11:45 - 12:15 HKT
As large language model (LLM) applications are widely deployed, their complex architectures challenge business observability. APM probes, which rely on instrumentation or proxy operation, consume system resources and impact traffic and performance, restricting their use in complex scenarios. Also, multiple teams handling different LLM instances make it hard to coordinate unified observability construction.
To solve this, China Mobile‘'s Panji platform collaborates with DeepFlow to achieve zero-intrusion (Zero Code) and full-stack (Full Stack) observability instantly, using eBPF and Wasm technologies. eBPF collects real-time data at the kernel level, while Wasm plugins parse streaming requests. By integrating existing data, the platform provides service universal map, distributed tracing, and multi-dimensional metric analysis, ensuring the stability and performance optimization of LLM applications.
Speakers
avatar for Jing Shang

Jing Shang

Chief Expert of China Mobile Group, China Mobile
Dr. Shang Jing, Chief Expert at China Mobile Group, has over 20 years of experience in IT system development, construction, and operation. Specializing in big data and cloud technologies, she led the development of China Mobile's Wutong Big Data Platform. Under her leadership, the... Read More →
avatar for Casey Li

Casey Li

Product Manager, Yunshan Networks, Inc.
Starting from graduate school at Huazhong University of Science and Technology in 2013, I joined Tencent Cloud virtual network team in 2016, which provided me with in-depth theoretical knowledge and practical experience in cloud networks. In 2018, I joined YUNSHAN Networks as PM... Read More →
Wednesday June 11, 2025 11:45 - 12:15 HKT
Level 16 | Grand Ballroom I
  Observability
 
Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.