Loading…
10-11 June
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon China 2025 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Hong Kong Standard Time (UTC+8:00)To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis. 
Company: Intermediate clear filter
arrow_back View All Dates
Wednesday, June 11
 

09:00 HKT

Keynote: Welcome Back + Opening Remarks - Keith Chan, Director of Strategic Planning, The Linux Foundation APAC
Wednesday June 11, 2025 09:00 - 09:10 HKT
Speakers
avatar for Keith Chan

Keith Chan

Director of Strategic Planning, The Linux Foundation APAC
Wednesday June 11, 2025 09:00 - 09:10 HKT
Level 16 | Grand Ballroom I
  Keynote Sessions

09:36 HKT

Keynote: Who Owns Your Pod? Observing and Blocking Unwanted Behavior at eBay With eBPF - Jianlin Lv, eBay & Liyi Huang, Isovalent at Cisco
Wednesday June 11, 2025 09:36 - 09:46 HKT
Kubernetes admins often struggle to understand pod activities, both for regular pods and those with various privileges. This session explores two use cases that highlight why Tetragon, an eBPF-based observability and enforcement tool, for pod security:
1.Replacing Auditbeat with Tetragon: Learn how Auditbeat rules mapped to Tetragon tracing policies, identifying functionality gaps, and how eBay contributed back to the community
2.Auditing Container Process Permissions: See how Tetragon helped analyze pod behavior and determine if applications could migrate to more restrictive pod security policies, ensuring adherence to the principle of least privilege
We also cover deployment challenges, such as integrating with SIEM platforms, resource utilization, and implementing runtime enforcement for unwanted pod behavior. This talk provides practical insights into using Tetragon for observability, policy refinement, and improving overall pod security posture in Kubernetes environments.
Speakers
avatar for Jianlin Lv

Jianlin Lv

Senior Linux Kernel Development Engineer, eBay
https://www.linkedin.com/in/jianlin-lv-25650141/
avatar for Liyi Huang

Liyi Huang

customer success architect, Isovalent at Cisco
senior solution architect @isovalent.com
Wednesday June 11, 2025 09:36 - 09:46 HKT
Level 16 | Grand Ballroom I
  Keynote Sessions, Observability

10:10 HKT

Keynote: Closing Remarks
Wednesday June 11, 2025 10:10 - 10:15 HKT
Wednesday June 11, 2025 10:10 - 10:15 HKT
Level 16 | Grand Ballroom I
  Keynote Sessions, Platform Engineering

11:00 HKT

Unified Observability in GRPC: Metrics and Tracing Using OpenTelemetry Plugin - Purnesh Dixit, Google
Wednesday June 11, 2025 11:00 - 11:30 HKT
gRPC’s performance advantages hinge on minimizing latency, but its binary protocol and streaming capabilities make debugging and monitoring inherently opaque. While distributed tracing identifies bottlenecks, metrics like error rates and throughput are critical for holistic insights. Yet, manual instrumentation for these signals in gRPC is complex, error-prone, and lacks standardization.

In this talk, Purnesh Dixit from the gRPC team unveils the new OpenTelemetry plugin for gRPC, developed by the gRPC team at Google, which provides unified metrics and tracing out-of-the-box to monitor retries, diagnose streaming bottlenecks, and optimize performance without invasive code changes.
1) Client-per-call: Track overall RPC lifecycle (e.g., grpc.client.call.duration).

2) Client-per-call-attempt: Analyze individual retries/hedges (e.g., grpc.client.attempt.duration).

3) Server-instruments: Measure concurrency, request queuing, and stream lifetimes (e.g., grpc.server.call.started).
Speakers
avatar for Purnesh Dixit

Purnesh Dixit

Purnesh Dixit (gRPC Team, Google), Google
Purnesh is a software engineer on the gRPC team at Google. He is a contributor to the OpenTelemetry and xDS support in gRPC-go.
Wednesday June 11, 2025 11:00 - 11:30 HKT
Level 16 | Grand Ballroom I
  Observability

11:00 HKT

Resilient Multiregion Global Control Planes With Crossplane and K8gb - Yury Tsarev & Steven Borrelli, Upbound
Wednesday June 11, 2025 11:00 - 11:30 HKT
Ensuring resilience in control planes is critical for organizations managing infrastructure and applications across multiple regions with Kubernetes. This talk presents a reference architecture for creating a Crossplane-based Global Control Plane, enhanced with k8gb for DNS-based failover and leveraging an Active/Passive setup.
We’ll explore how Crossplane’s declarative infrastructure provisioning integrates with k8gb to build robust, scalable, and resilient multicluster environments. Key takeaways include:

- Architecting resilient multiregion control planes with Active/Passive roles
- Demonstrating failover mechanisms where the Passive control plane transitions to Active during failures
- Strategies for optimizing failover times while maintaining availability

This session will guide attendees through proven methods and real-world challenges of building resilient Global Control Planes, empowering them to manage critical workloads across geographically distributed regions confidently.
Speakers
avatar for Steven Borrelli

Steven Borrelli

Principal Soutions Architect, Upbound
Steven is a Principal Solutions Architect for Upbound, where he helps customers adopt Crossplane.
avatar for Yury Tsarev

Yury Tsarev

Principal Solutions Architect, Upbound
Yury is an experienced software engineer who strongly focuses on open-source, software quality and distributed systems. As the creator of k8gb (https://www.k8gb.io) and active contributor to the Crossplane ecosystem, he frequently speaks at conferences covering topics such as Control... Read More →
Wednesday June 11, 2025 11:00 - 11:30 HKT
Level 19 | Crystal Court I
  Operations + Performance

11:45 HKT

How Bloomberg Creates a Resilient Data Analytics Platform Using Karmada - Michas Szacillo & Ilan Filonenko, Bloomberg
Wednesday June 11, 2025 11:45 - 12:15 HKT
Bloomberg’s Data Analytics Platform Engineering team supports a wide-range of real-time streaming, large batch ETL, and data exploration use-cases by using Apache Flink, Apache Spark, and Trino across multi-cluster Kubernetes. However, deploying and managing these workflows at scale efficiently can be challenging due to varying resource requirements and uptime needs. For stateful applications like Apache Flink, ensuring recovery and state conservation after downtime is especially important.

This session will discuss how Bloomberg uses Karmada, a multi-cluster management system, to deploy and manage Apache Flink. We’ll also explore how Karmada’s capabilities can be expanded to handle additional data analytics workloads, including Apache Spark and Trino. The session will cover the unique requirements and real-life use-cases for each, including:

- Resource-aware workload scheduling
- Custom resource requirements and health interpretation
- State conservation during application failover
Speakers
avatar for Ilan Filonenko

Ilan Filonenko

Engineering Group Lead, Bloomberg
Ilan Filonenko is an Engineering Group Lead focusing on Cloud Native Data Analytics Infrastructure at Bloomberg - where he has designed and implemented distributed systems at both the application and infrastructure level. Previously, Ilan was an engineering consultant and technical... Read More →
avatar for Michas Szacillo

Michas Szacillo

Tech Lead, Bloomberg L.P.
Michas is a senior software engineer and tech lead on Bloomberg’s Streaming Analytics engineering team. The platform, which is running on Kubernetes, serves as the foundation for many of Bloomberg's data streaming use cases. Michas is also a frequent collaborator to the CNCF community... Read More →
Wednesday June 11, 2025 11:45 - 12:15 HKT
Level 19 | Crystal Court II
  Data Processing + Storage

13:45 HKT

Solidigm CSAL Solution Brings Advanced IO Shaping, Caching and Data Placement Into NVIDIA DPU DOCA S - Wayne Gao, Solidigm & Long Chen, NVIDIA
Wednesday June 11, 2025 13:45 - 14:15 HKT
CSAL is Cloud Storage Acceleration Layer for BigData and AI. it is open-source user mode FTL, cache and io trace component inside SPDK(upstreamed). It commercially helps Alibaba cloud storage system.
refer https://www.solidigm.com/products/technology/cloud-storage-acceleration-layer-write-shaping-csal.html. Alibaba and Solidigm joint top computer conference paper Eurosys2024 https://dl.acm.org/doi/pdf/10.1145/3627703.3629566
Session Topics:
This session is joint development with NVIDIA DPU team and BeeGFS
1. CSAL leverage DPU DRAM as CSAL write buffer who achieve best storage latency ever also promise the data consistency.
2. QLC high density storage is favorable by AI industry since it save power and space for AI Data Center. DPU storage solution can achieve same thing, it is great combine two things together.
3. CSAL bring advanced storage IO shaping, caching and data placement SW into NVIDIA DPU DOCA storage SW service,
4. DPU and CSAL and BeeGFS experiment data sharing and report
Speakers
avatar for Long Chen

Long Chen

Director, NVIDIA
Take charge of promoting NVIDIA networking for high speed storage and new application market in China
avatar for Wayne Gao

Wayne Gao

Princinple storage solution architect, Solidigm
Wayne Gao is a Principal Engineer as Storage solution architect and worked on CSAL from PF to Alibaba commercial release. Wayne also takes main developer effort to finish CSAL pmem/DSA and cxl.mem PF from intel to Solidigm. Before joining Intel, Wayne has over 20 years of storage... Read More →
Wednesday June 11, 2025 13:45 - 14:15 HKT
Level 19 | Crystal Court II
  Data Processing + Storage

13:45 HKT

Progressive Delivery Made Easy With Argo Rollouts - Kevin Dubois, Red Hat
Wednesday June 11, 2025 13:45 - 14:15 HKT
ou might already be using a CI/CD solution, but are you 100% sure things will roll out without a glitch once you go to production?  Unfortunately differences between testing/staging and production environments are virtually unavoidable. There’s always a risk for unforeseen issues related to your production environment and/or actual load which can lead to potential disruptions to your users.

Progressive delivery is the next step after Continuous Delivery to roll out your application in a controlled and automated way so you can verify and test your application *in production* before it becomes fully available to all your user bases.

Embrace GitOps and Progressive Delivery with techniques like blue-green, canary release, shadowing traffic, dark launches and automatic metrics-based rollouts to validate the application in production using Kubernetes and tools like Istio, Prometheus, ArgoCD, and Argo Rollouts.

Come to this session to learn about Progressive Delivery in action using Kubernetes.
Speakers
avatar for Kevin Dubois

Kevin Dubois

Senior Principal Developer Advocate, Red Hat
Kevin is a Java Champion, software engineer, author and international speaker with a passion for Open Source, Java, and Cloud Native Development & Deployment practices. He currently works as developer advocate at Red Hat where he gets to enjoy working with Open Source projects and... Read More →
Wednesday June 11, 2025 13:45 - 14:15 HKT
Level 19 | Crystal Court I
  Platform Engineering

13:45 HKT

Connecting Dots: Unified Hybrid Multi-Cluster Auth Experience With SPIFFE and Cluster Inventory API - Chen Yu, Microsoft & Jian Zhu, Red Hat
Wednesday June 11, 2025 13:45 - 14:15 HKT
As the multi-cluster pattern continues to evolve, managing K8s identities, credentials, and permissions for teams and multi-cluster apps, such as Argo and Kueue, has become a hassle, typically involving managing individual service accounts on each cluster and passing credentials around. Such setup is often scattered, repetitive, difficult to track/audit, and may impose security and ops complications. This is especially true with hybrid environments, where different solutions could be in play across platforms.

This demo presents a solution based on OpenID, SPIFFE/SPIRE, and Cluster Inventory API from the Multi-Cluster SIG that provides a unified, seamless, and secure auth experience. Facilitated by CNCF multi-cluster projects, OCM and KubeFleet, attendees could be inspired to leverage open source solutions to eliminate credential sprawl, reduce operational complexity, and enhance security in hybrid cloud environments, when setting up teams/applications to access a multi-cluster setup.
Speakers
avatar for Chen Yu

Chen Yu

Senior Software Engineer, Microsoft
Chen Yu is a senior software engineer at Microsoft with a keen interest in cloud-native computing. He is currently working on Multi-Cluster Kubernetes and contributing to the Fleet project open-sourced by Azure Kubernetes Service.
avatar for Jian Zhu

Jian Zhu

Senior Software Engineer, RedHat
Zhu Jian is a senior software engineer at RedHat, a speaker at Kubecon China 2024, and a core contributor to the open cluster management project. Jian enjoys solving multi-cluster workload distribution problems and extending OCM with add-ons.
Wednesday June 11, 2025 13:45 - 14:15 HKT
Level 16 | Grand Ballroom I
  Security

14:30 HKT

The Past, the Present, and the Future of Platform Engineering - Mauricio "Salaboy" Salatino, Diagrid & Viktor Farcic, Upbound
Wednesday June 11, 2025 14:30 - 15:00 HKT
Do you think platform engineering is too hard? Or is it just a buzzword? Is the CNCF landscape too tricky to visualize? If you’ve been in this industry long enough, you should know that platform engineering has been around for a long time.

Most of us have been trying to build developer platforms for decades, and most of us have failed at that. That begs the questions: “What is different now?” “Why will this time be different?” and “Do we have a chance to succeed?”

We’ll take a look at the past, the present, and the future of platform engineering. We’ll see what we were doing in the past, what we did wrong, and why we failed. Further on, we’ll see what we (the industry as a whole) are doing now and, more importantly, where we might go from here.

Get ready for the hard truths and challenges you will face when trying to build a platform based on Kubernetes. Join us for a pain-infused journey filled with challenges teams will face when building platforms to enable other teams.
Speakers
avatar for Viktor Farcic

Viktor Farcic

Viktor Farcic, Upbound
Viktor Farcic is a lead rapscallion at Upbound, a member of the CNCF Ambassadors, Google Developer Experts, CDF Ambassadors, and GitHub Stars groups, and a published author. He is a host of the YouTube channel DevOps Toolkit and a co-host of DevOps Paradox.
avatar for Mauricio Salatino

Mauricio Salatino

Software Engineer, Diagrid
Mauricio works as an Open Source Software Engineer at @Diagrid, contributing to and driving initiatives for the Dapr OSS project. Mauricio also serves as a Steering Committee member for the Knative Project and Co-Leading the Knative Functions initiative. He published a book titled... Read More →
Wednesday June 11, 2025 14:30 - 15:00 HKT
Level 19 | Crystal Court I
  Platform Engineering

15:30 HKT

Stability in Large Model Training: Practices in Software and Hardware Fault Self-Healing - Yang Cao, Ant Group
Wednesday June 11, 2025 15:30 - 16:00 HKT
Training trillion-parameter AI models requires significant GPU resources, where any idle time leads to increased costs. Maintaining full-speed GPU utilization is crucial, yet hardware and software failures (such as firmware, kernel, or hardware issues) often disrupt large-scale training. For example, LLaMA3 experienced 419 interruptions over 54 days, with 78% due to hardware issues, underscoring the necessity for automated anomaly recovery.
At Ant Group, we will share:
GPU Monitoring: Comprehensive monitoring from hardware to applications to ensure optimal performance.
Self-Healing for Large GPU Clusters: Automated fault isolation, recovery from kernel panics, and node reprovisioning for clusters with 10,000+ GPUs.
Core Service Level Objectives (SLOs): Achieving over 98% GPU availability and more than 90% automatic fault isolation.
Predictive Maintenance: Using failure pattern analysis to reduce downtime and improve reliability.
Speakers
avatar for Yang Cao

Yang Cao

senior engineer, Ant Group
Yang Cao Senior Engineer, Ant Group Yang Cao is a senior engineer at Ant Group, currently focusing on ensuring the stability of large-scale distributed training on Kubernetes.
Wednesday June 11, 2025 15:30 - 16:00 HKT
Level 19 | Crystal Court II
  Cloud Native Experience

15:30 HKT

Composable Platforms: Modular Platform Engineering With Kratix and Backstage - Hossein Salahi, Liquid Reply
Wednesday June 11, 2025 15:30 - 16:00 HKT
Constructing and managing platforms for diverse teams and workloads presents a significant challenge in today's cloud-native environment. This session introduces the concept of composable platforms, using modular, reusable components as the foundation for platform engineering. This talk will demonstrate how using Kratix, a workload-centric framework, and Backstage an extensible developer portal enables the creation of self-service platforms that balance standardization with adaptability.

The session will detail platform design for scalability and governance, streamlining developer workflows through Backstage, and using Kratix Promises for varied workload requirements. Attendees will gain practical insights into building scalable and maintainable platforms through real-world examples, architectural patterns, and a live demonstration of a fully integrated Kratix-Backstage deployment.
Speakers
avatar for Hossein Salahi

Hossein Salahi

Tech Lead, Liquid Reply
Hossein is an experienced cloud computing professional with nearly a decade of expertise in distributed systems and cloud technologies. He began as a student specializing in cloud automation and progressed to a full-time role focusing on on-premises cloud infrastructure and containers... Read More →
Wednesday June 11, 2025 15:30 - 16:00 HKT
Level 19 | Crystal Court I
  Platform Engineering
 
Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.
Filtered by Date -