Crossing the Observability [Knowledge] Gap
As the APM market shifted to “Observability,” the system engineering definition took a back seat to marketing messaging (for better or worse), but at the end of the day, it’s still about being able to see and understand performance data. Over at DevOps.com, Mike Vizard talks about a recent survey that resulted in 48% of respondents “[citing] a lack of knowledge as the biggest challenge they encountered when trying to observe [cloud-native] applications.”
For over 20 years, the dirty little secret of APM was that to achieve all the value promised by the vendor (any vendor), customers would have to commit to hours, days, or even weeks of time from their application stakeholders (developers, Ops, architects, etc.) to reverse engineer application code so that the tool’s alerts, dashboards, custom metrics, etc. would provide some actual meaning.
As Cloud-Native applications emerged, the reverse engineering exercise became practically impossible. Throw in the ephemeral nature of orchestrated, containerized microservices, and you begin to understand why a lack of understanding exists.
The knowledge gap is also exacerbated (calendar word of the day) by employee movement, retiring engineers, “ancient” back-end systems and third party services, especially LLM / GenAI modules.
Recommended by LinkedIn
Don’t fret – it is possible to eliminate this gap, but not by throwing more expert hours into it. That will just result in more lost hours, and not much more understanding. Even if it’s initially right, one software update will render the observability obsolete again.
The only way to eliminate the gap is to have automated understanding (or context) built into your Observability platform, whether data comes from proprietary monitoring agents or open APIs like Prometheus and OpenTelemetry.
With today’s operations goals, expanding application portfolio and the desire to be more agile striving for continuous integration, automated monitoring becomes a requirement, not just a luxury.