By John Ward, Solutions Engineer at Apica

Recently, I attended a technical webinar hosted by Datadog and their migration partner NoBS that explored the realities of observability platform migrations. While the session focused on migrating to Datadog, the principles discussed apply universally to any observability platform transition.

Here are the ten critical principles that emerged from the discussion, along with my perspectives on how these insights apply to real-world implementations.

1. Migration Strategy Depends on Your Environment

There’s no one-size-fits-all approach to migration. The webinar outlined three common migration strategies:

Phased Migration (6-12 months): Best for large environments and risk-averse organizations. You migrate one domain at a time, validate functionality, and then proceed to the next. This approach minimizes disruption but extends the timeline.

Big Bang Migration (2-4 months): Requires intensive planning followed by rapid execution. While you run systems in parallel during transition, the actual cutover is swift. This works well for small to medium environments with strong internal teams.

Hybrid Approach (4-8 months): Learn on non-critical systems first, then apply those lessons to critical domains. This strikes a balance for organizations with mixed criticality across their infrastructure.

The key takeaway? Match your migration strategy to your organization’s risk tolerance, team capabilities, and infrastructure complexity.

2. Agent Deployment Should Be Automated

The webinar recommended Ansible for agent deployment, but the broader principle is automation. Manual agent installation across hundreds or thousands of hosts is error-prone and time-consuming.

For organizations already using configuration management tools or fleet management solutions, this phase should leverage existing automation infrastructure. The goal is consistency and speed while maintaining proper version control and rollback capabilities.

3. Auto-Instrumentation Is Your Friend

For APM instrumentation, the presenters recommended auto-instrumentation and noted it’s used about 90% of the time. This makes sense: Manual instrumentation is time-intensive and requires deep application knowledge. Auto-instrumentation gets you observability coverage quickly, and you can always fine-tune with manual instrumentation later for specific use cases.

4. Tagging Strategy Is Foundation, Not Afterthought

Create a comprehensive tagging strategy before you migrate and apply it consistently across all resources. The recommended baseline includes:

  • Environment (dev, staging, prod)
  • Service name
  • Version
  • Team ownership
  • Application

These tags enable easier correlation, troubleshooting, and cost allocation later. Retrofitting tags after migration is painful and often incomplete. Get this right from the start.

5. RBAC Setup Protects Your Migration

Role-Based Access Control isn’t just a security concern during migration; it’s a governance mechanism. Properly configured RBAC ensures that only the appropriate teams can make changes, migrate workloads, and access sensitive telemetry data. Set up RBAC rules early so your migration proceeds in a controlled, coordinated manner.

6. Logs Are a Cost Decision

This was the most important insight from the webinar. The presenters were explicit: With Datadog’s pricing model, you must actively decide which logs matter and create a strategy to keep costs under control. Not all logs are created equal, and indiscriminate log ingestion leads to budget overruns.

This represents a fundamental philosophical difference in observability solutions. Some vendors require you to be strategic about what you ingest because pricing is directly tied to data volume. Others, including Apica, offer flat-rate pricing models where you don’t need to agonize over every log line. If a prospect or customer is considering a volume-based platform, it’s worth highlighting this operational overhead: They’ll need to continuously manage which logs get ingested, create exclusion rules, and monitor costs—or risk unexpected bills.

But there’s a third approach that addresses cost concerns without sacrificing observability: Telemetry pipeline solutions like Apica Flow. Flow sits between your data sources and your observability tools, processing, optimizing, and routing data before it reaches its destination. This means you can:

  • Reduce data volume through intelligent filtering and sampling
  • Enrich and transform data to maximize value
  • Route different data types to the most cost-effective storage
  • Control costs without sacrificing visibility

With Flow, you optimize your data before it ever hits your observability platform, whether that platform charges by volume or offers flat-rate pricing. And when combined with Apica’s flat-rate observability solution, you get both predictable costs and complete control over your telemetry pipeline. No complex triage strategy required.

7. Dashboard Migration Requires Curation

Don’t automatically port every dashboard from your legacy system. This is an opportunity for a fresh start. Many dashboards fall into disuse over time; they were created for a specific incident, experiment, or project that’s no longer relevant.

During migration, ask: “Is this dashboard still necessary? Who uses it? What decisions does it support?” Migrate only what adds value. Your new platform will be cleaner and more useful as a result.

8. Alert Cleanup Is Migration’s Hidden Gift

Similar to dashboards, alerts accumulate over time. Teams create alerts that become noise, get ignored, and then persist because no one wants to be the person who deleted an alert that might have been important.

Migration forces you to evaluate each alert with fresh eyes:

  • Is this alert critical?
  • What action should someone take when it fires?
  • Does this alert still reflect current system architecture?

If an alert doesn’t meet the criteria for actionable, critical monitoring, leave it behind. Your on-call teams will thank you.

9. Legacy Monitoring Patterns May Not Translate

The webinar emphasized that many organizations install agents, recreate legacy dashboards and alerts, and assume their existing monitoring model will translate directly. In practice, this often leads to alert noise, higher costs, and an observability setup that doesn’t reflect how modern systems actually work.

Modern observability approaches offer different capabilities, dynamic baselines, distributed tracing, service maps, anomaly detection, that may render some legacy tools obsolete. Don’t just lift and shift your old patterns. Take advantage of your new platform’s strengths.

The AI Amplification Factor: This challenge is about to intensify dramatically. As organizations deploy AI agents, LLMs, and autonomous systems, these deployments generate 10-100x more telemetry data than traditional applications. According to Gartner, by 2027, 35% of enterprises will see observability costs consume more than 15% of their overall IT operations budget, driven largely by this AI-induced data explosion.

Your migration strategy needs to account for this shift. If you’re building your observability architecture today using patterns from five years ago, you’re not preparing for the scale challenges ahead. Modern telemetry pipelines need to intelligently process, optimize, and route data before it hits your observability platforms, whether you’re running traditional workloads, AI deployments, or (most likely) both.

The organizations that succeed in the agentic AI era won’t just have better observability tools, they’ll have fundamentally different telemetry architectures designed to handle AI-scale data volumes without spiraling costs.

10. Migration Is About Modernization, Not Just Movement

The overarching theme of the webinar was that successful migration isn’t just about moving tools; it’s about rethinking your observability practice. This is your chance to eliminate technical debt, adopt better practices, and align your monitoring with how your systems actually operate today.

Ask yourself:

  • What telemetry is worth migrating, and what should we leave behind?
  • How can we reduce noise and accelerate time to value?
  • What pitfalls exist across our cloud, Kubernetes, and hybrid environments?
  • How do we sequence tagging, dashboards, alerts, and SLOs correctly?
  • Are we building a telemetry architecture that can handle AI-scale data volumes?
  • Do we have a strategy to control costs as data volumes increase 10-100x?

That last question is critical. Migration projects often focus on the immediate technical transition, but the best migrations also prepare for what’s coming next. If your organization is deploying, or planning to deploy, AI agents, LLMs, or autonomous systems, your telemetry architecture needs to account for the data explosion these technologies create.

Final Thoughts

Migrating to a new observability solution is never trivial, but with the right approach, it becomes an opportunity for meaningful improvement. Whether you’re migrating to Datadog, Apica, or any other vendor, these ten principles provide a solid foundation for success.

The organizations that struggle with migration are those that treat it as a technical checkbox: Install the agent, recreate alerts, done. The organizations that thrive are those that use migration as a forcing function to modernize their observability practice, eliminate cruft, and build something better than what they had before.

And in 2025 and beyond, “building something better” increasingly means building something that can scale. The telemetry data volumes from AI deployments aren’t a distant future concern; they’re happening now. Organizations that design their telemetry architecture with this growth in mind will avoid having to migrate again in two years when their costs spiral out of control.

If you’re planning an observability migration and want to discuss strategies for your specific environment, I’d be happy to connect. Feel free to reach out.

About the Webinar: “Accelerate to Observability: Migration Modernized with Datadog & NoBS” was held on January 28, 2025, as a practitioner-level technical session for engineers and operators responsible for reliability, performance, or leading migrations.

About the Author: We’ll add John’s bio here.