High Cardinality at Scale: Rethinking Observability for Cloud- native Environments. Download here

Observability in K8s: Moving from Reactive to Predictive

thenewstack

Share This News

As organizations gather for KubeCon + CloudNativeCon EU 2025 in London, observability
remains a key focus for teams managing complex Kubernetes deployments

by Lori Bertelli, director of product marketing + communications, Apica

This was originally published on The New Stack

The landscape of cloud native observability continues to evolve rapidly as organizations deploy increasingly complex Kubernetes environments. KubeCon + CloudNativeCon EU 2025 presents an opportune time to examine how observability has evolved, the current challenges and the emerging AI-driven solutions that promise to transform this space from reactive to predictive.

Observability has undergone significant evolution over the past decade and continues to advance in the era of AI. We are at another pivotal moment that will fundamentally change the way teams observe and manage their systems.

The Increasing Complexity of Observability

Organizations that operate cloud native applications encounter a common set of challenges regarding monitoring and observability. Kubernetes environments generate massive amounts of telemetry data across distributed systems, containerized applications and microservices, creating a data management challenge of significant proportions. Many organizations are experiencing rapidly increasing costs for storing and processing this observability data.

A particularly acute challenge is how to tame data growth and cardinality. With microservices and Kubernetes, cardinality has grown in recent years, and AI-driven applications will accelerate this trend even further. Discarding data to manage costs isn’t feasible, as teams need more granular data to understand complex systems.

The complexity doesn’t end with data volume and cost. Teams often face challenges due to fragmented visibility across multiple monitoring tools, which makes it difficult to correlate issues across different parts of the tech stack.

This fragmentation creates a significant issue: Engineering teams experience noise and alert fatigue, making it more challenging to identify and address critical problems. Additionally, as privacy regulations become stricter worldwide, ensuring regulatory compliance with sensitive data in logs and metrics has become increasingly important.

These challenges are especially pronounced in Kubernetes environments, where workloads’ dynamic and ephemeral nature adds further complexity to traditional monitoring approaches. Containers that exist for only minutes or seconds, auto-scaling workloads and constantly changing infrastructure all contribute to a monitoring environment that traditional tools struggle to manage effectively.

Unified Telemetry Management: A Modern Approach

Modern observability platforms have evolved beyond basic monitoring to provide unified telemetry data management. This comprehensive approach offers several advantages that help organizations navigate the complexities of cloud native environments.

Centralized Pipeline Management

Instead of managing multiple disconnected tools, organizations are shifting toward centralized control planes for all their telemetry data. This strategy ensures consistent data collection across Kubernetes clusters, cloud services and on-premises infrastructure, effectively eliminating blind spots and reducing operator cognitive load.

Full-stack visibility into applications, infrastructure and security posture becomes possible when data is collected and managed consistently. This centralization also enables seamless integration with existing Kubernetes monitoring tools and platforms, leveraging investments already made while adding new capabilities.

Dynamic Fleet Management

Traditional static methods of telemetry are being replaced by dynamic, flexible systems that can adapt to changing environments. Modern observability solutions can automatically scale data collection based on real-time environmental factors, ensuring that resources are used efficiently.

Centralized configuration management for agent deployments reduces the operational burden of managing observability components across large fleets of servers and containers. This approach simplifies life cycle management for observability components, making upgrades and configurations more manageable across distributed environments.

Intelligent Data Flow Control

Gaining control over telemetry data flows has become essential for cost management and performance optimization. Advanced observability platforms now offer capabilities to filter and reduce noisy, redundant data without losing critical insights. They can mask and transform sensitive information to ensure compliance with regulations like GDPR and The California Consumer Privacy Act (CCPA), protecting user privacy while maintaining observability.

Many systems now support enriching telemetry data with additional context for deeper insights, combining data from multiple sources to provide a more complete picture of system behavior. Intelligence in data routing allows high-value data to be sent to specialized analytics tools while storing full-fidelity replicas cost-effectively for later analysis when needed.

Purpose-Built Data Storage

A significant architectural shift is occurring with the rise of purpose-built observability data lakes. This gives organizations the ability to effortlessly manage vast volumes of data, integrate with object storage, index and access all data. The ability to index incoming data and gain access on demand and in real time eliminates the need for tiered storage. Think of it as a storage task buster.

Purpose-built data repositories designed specifically for observability data are helping organizations address the unique challenges of storing and querying time-series data, logs and traces. These specialized repositories enable teams to store operational data at scale with optimized costs, often achieving significant savings compared to general-purpose databases.

By treating object storage as a primary storage layer and enhancing its performance to match real-time storage capabilities, these solutions eliminate the need for tiered storage while providing full indexing of all incoming data. This approach supports maintaining extended retention periods without exceeding budgets, allowing teams to investigate issues that occurred weeks, months or years ago.

This architectural approach addresses several key challenges: Cross-correlating between different types of telemetry data (metrics, logs, traces and events) that are typically stored separately, enabling fast queries across these data sources, and providing a path for compliance through centralized log management.

Advanced solutions in this space allow for “time travel” capabilities where historical data from any timeframe can be queried, retrieved and replayed instantly. High-performance query capabilities enable teams to analyze historical data quickly, often using formats like Apache Parquet that ensure data availability at scale, simplified retrieval and faster time to insight.

The elastic design of modern observability data lakes ensures high availability even in the most voluminous data environments, with horizontal scaling that can handle unexpected data spikes at endpoints, avoid backlogs and prevent data loss at scale.

Key Capabilities Transforming Enterprise Observability

Several capabilities are emerging as essential for next-generation observability platforms as the industry matures and organizations gain experience with cloud-native environments.

OpenTelemetry has become the foundation for vendor-neutral observability, with organizations adopting it to standardize instrumentation across their applications. This open standard helps reduce vendor lock-in and simplifies integration with multiple observability solutions, giving organizations more flexibility in their tooling choices. By adopting OpenTelemetry, organizations are also future-proofing their observability strategy, ensuring they can adapt as the landscape evolves.

AI and machine learning are transforming the way teams interpret observability data, shifting the paradigm from reactive to predictive. While AI is creating new challenges for observability through increased complexity and non-deterministic behavior, it offers powerful solutions.

AI-driven root cause analysis (RCA) is emerging as a game-changer, automating much of the manual toil engineering teams face when troubleshooting issues. These technologies help reduce mean time to recovery (MTTR) by automatically identifying the source of problems quickly. Rather than requiring engineers to manually query logs, correlate traces and sift through dashboards, AI systems can proactively identify emerging issues before they affect users, fundamentally shifting operations from reactive to proactive.

AI observability systems will focus on deep pattern analysis across vast amounts of telemetry data. Anomaly detection powered by machine learning can identify security threats or performance issues that would otherwise go unnoticed in the flood of data. As these systems mature, they will increasingly generate actionable recommendations for system optimization, helping teams improve reliability and performance.

As observability data volumes grow, cost optimization has become a primary concern for organizations of all sizes. Teams are developing strategies to identify and filter noise and redundant data without losing critical insights. Many organizations implement intelligent sampling techniques that preserve important data while reducing overall volume. Tiered storage strategies for different data types help optimize costs by storing frequently accessed data on fast storage while moving historical data to cheaper options. Advanced compression and data reduction techniques help manage costs as data volumes grow.

Modern observability approaches are helping organizations break down silos between development, operations and security teams. Shared dashboards and unified visibility ensure everyone works from the same data, reducing finger-pointing and improving collaboration. Cross-functional collaboration in issue identification becomes more natural when all teams access the same observability data. Common data models that work across different team perspectives help bridge the language gap between developers, operators and security professionals. The best solutions support specialized views for experts and generalist views for cross-functional team members, ensuring everyone gets the insights they need.

The Path Forward

As teams gather at KubeCon + CloudNativeCon EU 2025, several trends will likely shape discussions around observability. The industry is moving toward democratizing access to enterprise-grade observability, making powerful tools accessible to teams of all sizes rather than just large enterprises with significant resources.

Organizations are implementing stronger governance around observability data to manage costs and ensure compliance, treating observability data as a valuable asset that requires careful management.

Adding business and application context to technical metrics is becoming increasingly important as organizations seek to understand the real-world impact of technical issues. This contextual intelligence helps bridge the gap between IT metrics and business outcomes, making observability more valuable to the organization.

There’s also a growing recognition that observability should be considered early in the development life cycle, not added as an afterthought. Teams are creating standardized, repeatable approaches to implementing observability across different services and environments, ensuring consistent data collection and monitoring practices throughout their organizations.

The Next Decade of Observability

The observability landscape for Kubernetes and cloud native environments continues to evolve rapidly. Organizations can address the growing complexity and cost challenges by adopting unified telemetry-management approaches, embracing open standards like OpenTelemetry and AI-driven analytics while gaining deeper insights into their systems.

Looking ahead, the future of observability will continue to focus on giving engineers enhanced abilities to understand and optimize their systems with predictive and preventative approaches.

The conversations that come out of KubeCon + CloudNativeCon EU 2025 will undoubtedly shape the future of cloud native observability and help organizations build more resilient, observable systems. The collective innovation of the cloud native community continues to produce solutions that make complex distributed systems more manageable, reliable, and cost-effective.

Author bio

Driving Apica’s strategic product marketing for the Ascent platform—a leader in telemetry data management and observability—Lori transforms complex technological value into compelling market narratives. With a proven track record of elevating B2B tech companies to market leadership positions, she specializes in crafting messaging that resonates and differentiates. As a product marketing and communications strategist, Lori has guided organizations through every critical growth phase—from early-stage startups to successful exits and IPOs—building the strategic foundations that turn innovation into industry influence.

Subscribe To Our Newsletter

Get updates and learn from the best