The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon China 2025 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.
Please note: This schedule is automatically displayed in Hong Kong Standard Time (UTC+8:00). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis.
Sign up or log in to add sessions to your schedule and sync them to your phone or calendar.
When we started CNCF in 2015 to help advance container technology, Kubernetes was the seeding technology to provide a de facto container orchestration platform for all cloud native applications. Almost a decade later, the community has exploded with 200+ open source projects building on top of cloud native technologies. Looking ahead, what challenges will we have in the next decade? What gaps remain for users and contributors? And how do we evolve to meet the demands of an increasingly complex and connected world?
Let us review some of the key CNCF projects today and lay out some possible avenues for where cloud native is going for the next decade, AI, agentic network, sustainability and beyond.
Lin is the Head of Open Source at Solo.io, and a CNCF TOC member and ambassador. She has worked on the Istio service mesh since the beginning of the project in 2017 and serves on the Istio Steering Committee and Technical Oversight Committee. Previously, she was a Senior Technical... Read More →
gRPC’s performance advantages hinge on minimizing latency, but its binary protocol and streaming capabilities make debugging and monitoring inherently opaque. While distributed tracing identifies bottlenecks, metrics like error rates and throughput are critical for holistic insights. Yet, manual instrumentation for these signals in gRPC is complex, error-prone, and lacks standardization.
In this talk, Purnesh Dixit from the gRPC team unveils the new OpenTelemetry plugin for gRPC, developed by the gRPC team at Google, which provides unified metrics and tracing out-of-the-box to monitor retries, diagnose streaming bottlenecks, and optimize performance without invasive code changes. 1) Client-per-call: Track overall RPC lifecycle (e.g., grpc.client.call.duration).
Ensuring resilience in control planes is critical for organizations managing infrastructure and applications across multiple regions with Kubernetes. This talk presents a reference architecture for creating a Crossplane-based Global Control Plane, enhanced with k8gb for DNS-based failover and leveraging an Active/Passive setup. We’ll explore how Crossplane’s declarative infrastructure provisioning integrates with k8gb to build robust, scalable, and resilient multicluster environments. Key takeaways include:
- Architecting resilient multiregion control planes with Active/Passive roles - Demonstrating failover mechanisms where the Passive control plane transitions to Active during failures - Strategies for optimizing failover times while maintaining availability
This session will guide attendees through proven methods and real-world challenges of building resilient Global Control Planes, empowering them to manage critical workloads across geographically distributed regions confidently.
Yury is an experienced software engineer who strongly focuses on open-source, software quality and distributed systems. As the creator of k8gb (https://www.k8gb.io) and active contributor to the Crossplane ecosystem, he frequently speaks at conferences covering topics such as Control... Read More →
Peer Group Mentoring allows participants to meet with experienced open source veterans across many CNCF projects. Mentees are paired with 2 – 10 other people in a pod-like setting to explore technical, community, and career questions together.
Bloomberg’s Data Analytics Platform Engineering team supports a wide-range of real-time streaming, large batch ETL, and data exploration use-cases by using Apache Flink, Apache Spark, and Trino across multi-cluster Kubernetes. However, deploying and managing these workflows at scale efficiently can be challenging due to varying resource requirements and uptime needs. For stateful applications like Apache Flink, ensuring recovery and state conservation after downtime is especially important.
This session will discuss how Bloomberg uses Karmada, a multi-cluster management system, to deploy and manage Apache Flink. We’ll also explore how Karmada’s capabilities can be expanded to handle additional data analytics workloads, including Apache Spark and Trino. The session will cover the unique requirements and real-life use-cases for each, including:
- Resource-aware workload scheduling - Custom resource requirements and health interpretation - State conservation during application failover
Ilan Filonenko is an Engineering Group Lead focusing on Cloud Native Data Analytics Infrastructure at Bloomberg - where he has designed and implemented distributed systems at both the application and infrastructure level. Previously, Ilan was an engineering consultant and technical... Read More →
Michas is a senior software engineer and tech lead on Bloomberg’s Streaming Analytics engineering team. The platform, which is running on Kubernetes, serves as the foundation for many of Bloomberg's data streaming use cases. Michas is also a frequent collaborator to the CNCF community... Read More →
Not everything can be thought about while designing or developing the applications, and as such lot of the design decisions are based on estimates and potential usage patterns.
More often that not, these estimates differ from reality and introduce inefficiencies in the system across several fronts - and if at all visible, it always much later in the lifecycle when you already have several customers & high footprint.
And hence, unless there is a clear sign of performance degradation or unjustified costs, there is often no incentive to invest time & effort for some unknown gains.
In this session Yash will outline a real world case study about how they went about building an internal platform for handling several aspects of post deployment challenges like
1. rightsizing opportunities, 2. architecture migrations like moving to serverless, 3. finding right maintenance windows, etc
by using a wide range of metrics, and how impactful these minor optimizations turned out to be.
Yash is working with Google as Software Engineer, and has 9 years of industrial experience with cloud architectures and micro-service development across Google and VMware. He has been a speaker at several international conferences such as KubeCon + CloudNativeCon and Open Source... Read More →
ou might already be using a CI/CD solution, but are you 100% sure things will roll out without a glitch once you go to production? Unfortunately differences between testing/staging and production environments are virtually unavoidable. There’s always a risk for unforeseen issues related to your production environment and/or actual load which can lead to potential disruptions to your users.
Progressive delivery is the next step after Continuous Delivery to roll out your application in a controlled and automated way so you can verify and test your application *in production* before it becomes fully available to all your user bases.
Embrace GitOps and Progressive Delivery with techniques like blue-green, canary release, shadowing traffic, dark launches and automatic metrics-based rollouts to validate the application in production using Kubernetes and tools like Istio, Prometheus, ArgoCD, and Argo Rollouts.
Come to this session to learn about Progressive Delivery in action using Kubernetes.
Kevin is a Java Champion, software engineer, author and international speaker with a passion for Open Source, Java, and Cloud Native Development & Deployment practices. He currently works as developer advocate at Red Hat where he gets to enjoy working with Open Source projects and... Read More →
Strong communities foster a feeling of belonging by providing opportunities for interaction, collaboration, and shared experiences. We hope to do just that with a gathering of attendees who identify as women and non-binary individuals at KubeCon + CloudNativeCon China! Join fellow women community members for networking and connection.
Do you think platform engineering is too hard? Or is it just a buzzword? Is the CNCF landscape too tricky to visualize? If you’ve been in this industry long enough, you should know that platform engineering has been around for a long time.
Most of us have been trying to build developer platforms for decades, and most of us have failed at that. That begs the questions: “What is different now?” “Why will this time be different?” and “Do we have a chance to succeed?”
We’ll take a look at the past, the present, and the future of platform engineering. We’ll see what we were doing in the past, what we did wrong, and why we failed. Further on, we’ll see what we (the industry as a whole) are doing now and, more importantly, where we might go from here.
Get ready for the hard truths and challenges you will face when trying to build a platform based on Kubernetes. Join us for a pain-infused journey filled with challenges teams will face when building platforms to enable other teams.
Viktor Farcic is a lead rapscallion at Upbound, a member of the CNCF Ambassadors, Google Developer Experts, CDF Ambassadors, and GitHub Stars groups, and a published author. He is a host of the YouTube channel DevOps Toolkit and a co-host of DevOps Paradox.
Mauricio works as an Open Source Software Engineer at @Diagrid, contributing to and driving initiatives for the Dapr OSS project. Mauricio also serves as a Steering Committee member for the Knative Project and Co-Leading the Knative Functions initiative. He published a book titled... Read More →
Maximizing security in multi-tenant clusters while maintaining cost-effectiveness is crucial for enterprise OPS. Most enterprise clusters deploy multiple daemonsets, which are attractive targets for attackers seeking to escape and move laterally, ultimately taking over the entire cluster.
The SIG community has introduced several advanced security features recently, such as CRD Field Selectors, Field and Label Selector Authorization, validating admission policy (VAP), and Structured Authorization Config. These allow users to define more flexible authorization configurations, addressing filtering and authorization needs for CRDs, kubelet, and other resources in multi-tenant environments.
We will share the lessons learned from the node escape incidents and demonstrate how to implement these new features and show how to use the Common Expression Language (CEL) to configure customized policies in Authorization Webhook and VAP, resulting more node-specific restrictions within clusters.
Dahu Kuang is a Security Tech Lead on the Alibaba Cloud Container Service for Kubernetes (ACK) team, focusing on the design and implementation of container security-related work, especially within the context of secure supply chain.
Cheng Gao, Senior Security Engineer at Alibaba Cloud, focuses on the Security Development Lifecycle (SDL) for cloud-native applications. With expertise in container services, observability, and Serverless architectures, Cheng has led security assurance for several internal container... Read More →
When you're new to Kubernetes, Policy as Code (PaC) can be a very unfamiliar topic. But as you get more familiar with Kubernetes, you'll probably be interested in how you can use it securely, especially since Kubernetes is essentially a declarative system via YAML, so having security also be done in code will help with usability and reducing human error.
In order to make PaC easier to understand, I'll demonstrate the Admission Control part directly in Kubernetes. Until recently, this part was based on webhooks, but since v1.23, the decision to actively embrace the Common Expression Language (CEL) has made it possible to apply it as code directly inside Kubernetes. Validating Admission Policy became GA in v1.30, and Mutating Admission Policy is in Alpha in v1.32.
Based on this outline, I'll talk about how PaC has been applied to Kubernetes in the past, how it works today, and finally, how we can expect it to be integrated into Kubernetes in the future.
Hoon Jo is Cloud Solutions Architect as well as Cloud Native engineer at Megazone. He has many times of speaker experience for cloud native technologies. And spread out Cloud Native Ubiquitous in the world. He has written several books and latest books is 『CONTAINER INFRASTRUCTURE... Read More →
Constructing and managing platforms for diverse teams and workloads presents a significant challenge in today's cloud-native environment. This session introduces the concept of composable platforms, using modular, reusable components as the foundation for platform engineering. This talk will demonstrate how using Kratix, a workload-centric framework, and Backstage an extensible developer portal enables the creation of self-service platforms that balance standardization with adaptability.
The session will detail platform design for scalability and governance, streamlining developer workflows through Backstage, and using Kratix Promises for varied workload requirements. Attendees will gain practical insights into building scalable and maintainable platforms through real-world examples, architectural patterns, and a live demonstration of a fully integrated Kratix-Backstage deployment.
Hossein is an experienced cloud computing professional with nearly a decade of expertise in distributed systems and cloud technologies. He began as a student specializing in cloud automation and progressed to a full-time role focusing on on-premises cloud infrastructure and containers... Read More →
AI developer in K8S: either in Jupyter notebook or LLM serving: Python Dependency is always a headache : - Prepare a set of base Images? The maintenance amounts & efforts will be a nightmare: Since (1) packages in AI world are rapidly version bumping, (2) diff llm codes require diff packages permutation/combination. - Leave users to `pip install` by themselves ? The resigned waiting blocks productivity and efficiency. You may agree if you did it. - If on a GPU Cloud, the pkg preparation time may even cost a lot: you rent a GPU but wasted in waiting pip downloading... - you may choose to D.I.Y: docker-commit your own base-images, but you have to worry about the Dockerfile, registry and additional cloud cost if you don't have local docker env.
---- So we introduce https://github.com/BaizeAI/dataset.
The solution: 1. A CRD to describe the dependency and env. 2. K8S Job to pre-load the packages. 3. PVC to store and mount 4. `conda` to switch from envs 5. share between namespaces
Cloud native developer, AI researcher, Gopher with 5 years of experience in loads of development fields across AI, data science, backend, frontend. Co-founder of https://github.com/nolebase