Telemetry: Unlocking the Hidden Power of Observability in Axon Server Applications

Telemetry: Unlocking the Hidden Power of Observability in Axon Server Applications

As applications grow in complexity, understanding their performance and behavior becomes a critical challenge. At AxonIQ Conference 2024, Richard Bouška, CTO of ASSIST, delivered a compelling talk on how telemetry—often an overlooked aspect of system architecture—can transform the way we monitor and optimize Axon Server-centric applications.

What is Telemetry, and Why Does It Matter?

Telemetry encompasses the collection and analysis of operational data such as logs, metrics, and traces to answer the fundamental question: “What’s happening in my system?” For teams working with Axon Framework and Axon Server, telemetry becomes the key to achieving transparency, ensuring resilience, and fine-tuning performance across distributed applications.

“Without telemetry, you’re left reacting to user complaints instead of proactively addressing issues,” Richard explained. “It’s the mechanical sympathy of modern software systems.”

From Metrics to Mastery: Richard’s Journey with Telemetry

Richard walked the audience through ASSIST’s multi-year evolution with Axon technologies:

  • 2021: Focused on CQRS and event sourcing for building domain-driven architectures.
  • 2022: Adopted microservices to distribute applications more effectively.
  • 2023: Scaled applications globally, making location transparency a priority.
  • 2024: Observability emerged as a cornerstone for ensuring system reliability and optimizing performance.

Telemetry became essential as ASSIST deployed increasingly complex systems worldwide. Richard’s team used tools like Prometheus and Grafana to collect, visualize, and analyze metrics. These tools allowed them to spot anomalies, track resource usage, and even predict issues before they became critical.

Lessons Learned: The Challenges of Telemetry

Richard didn’t shy away from the hurdles:

  1. Information Overload: With so many metrics to track—Java Virtual Machine (JVM) stats, Axon Server data, custom business metrics—it’s easy to drown in data. Teams must carefully choose what to monitor.
  2. Complexity of Tools: Teams had to learn multiple query languages, statistical concepts, and dashboarding techniques. Richard humorously noted that adopting dark mode in Prometheus was a turning point for their younger developers.
  3. Interpreting Data: Even simple metrics like averages can be misleading. Richard explained how poor aggregation could create resonances or skew results, leading to incorrect conclusions.

The Benefits: Why Invest in Telemetry?

Despite the challenges, telemetry offers undeniable advantages:

  • Detecting Issues Early: By monitoring metrics like memory usage after garbage collection, teams can spot potential problems like memory leaks before they impact production.
  • Optimizing System Performance: Richard highlighted how snapshotting and proper configuration reduced command handling times from 1.5 seconds to 30 milliseconds.
  • Supporting Collaboration: Collocating development, DevOps, and observability teams enabled faster issue resolution and better system design alignment.

One of the most striking insights was Richard’s emphasis on “mechanical sympathy”—understanding how a system is designed to be used and aligning its operation with that design. Telemetry provides the visibility needed to achieve this harmony.

Practical Applications for Axon Server Users

Richard demonstrated how telemetry transformed their Axon Server deployments:

  • Node Connection Monitoring: By visualizing how applications connected to Axon Server nodes, they could identify and fix inconsistencies.
  • Event Processing Analysis: Metrics like the last token per context helped ensure event streams were processed correctly.
  • Command and Query Optimization: Real-time monitoring of command durations and query response times allowed for precise tuning and reduced latency.

He also encouraged teams to replay their event stores periodically. “It’s amazing what you can learn by observing patterns over millions of events,” Richard remarked. Replay data not only revealed performance bottlenecks but also provided insights into user behavior and system evolution.

Key Takeaways for Teams Using Axon Technologies

  1. Telemetry Is Essential, Not Optional: Modern applications require visibility to ensure reliability and performance.
  2. Start Simple, Then Iterate: Focus on key metrics like memory usage, event processing rates, and command durations before expanding to more complex analyses.
  3. Collaboration Boosts Success: Observability isn’t just about tools; it’s about aligning teams and sharing knowledge.
  4. Invest in the Right Tools: Tools like Prometheus, Grafana, and Axon Server’s telemetry capabilities provide powerful frameworks for monitoring distributed systems.

Closing Thoughts

As Richard concluded, “Telemetry is the most important feature of Axon.” While dashboards and graphs might seem overwhelming at first, they are invaluable tools for ensuring your systems remain efficient, resilient, and scalable. Whether you’re debugging a memory issue, optimizing event processing, or predicting user behavior, telemetry equips your team with the insights needed to stay ahead.

Ready to optimize your Axon Server deployments? Explore how AxonIQ’s solutions can help you leverage telemetry and gain unparalleled visibility into your systems. Discover more here.

To view or add a comment, sign in

More articles by AxonIQ

Insights from the community

Explore topics