Loading…
10-11 June
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon China 2025 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Hong Kong Standard Time (UTC+8:00)To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis. 
Tuesday June 10, 2025 15:37 - 15:42 HKT
As AI/ML workloads continue to scale in complexity, developers and platform engineers are pushing Kubernetes beyond typical MLOps boundaries.

This talk dives into strategies for orchestrating GPU-accelerated training and inference across large-scale clusters -integrating HPC principles, operator-based scheduling, and novel debugging workflows.

Attendees will learn how to implement fine-grained GPU partitioning, harness ephemeral containers to probe and adjust multi-node training in real time, and adopt eBPF-driven instrumentation for low-overhead kernel-level performance insights. We’ll explore cutting-edge scheduling optimizations—like reinforcement-learning approaches and HPC-inspired batch-queuing orchestration on Kubernetes that dynamically respond to heterogeneous job demands.

Real-world case studies will highlight HPC integration scenarios (RDMA, GPU Direct) for data-parallel workloads and complex training frameworks such as Horovod, Ray, and Spark on Kubernetes.
Speakers
avatar for Brandon Kang

Brandon Kang

Principal Technical Solutions Architect, Akamai Technologies
Brandon Kang is a Principal Technical Solutions Architect at Akamai Technologies, specializing in cloud-native projects across Asia as a compute specialist.Before joining Akamai, he served as a Lead Software Engineer at Samsung, a Senior Program Manager at Microsoft, and a Service... Read More →
Tuesday June 10, 2025 15:37 - 15:42 HKT
Level 16 | Grand Ballroom I
  ⚡ Lightning Talks, Application Development

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link