Design for Observability - Role of Metrics Ep 2

Swapnil K.

Helping Customer teams with Cloud Native Ops, Platform Engg, Solutions for Observability needs | Experience with Security, Compliance, Automation | Aspiring Fractional Exec

Published May 15, 2024

In the previous article, I discussed the best practices, organizations can leverage metrics effectively within their observability platforms to gain actionable insights, improve system reliability, and drive business value. In this article let us look at some of the strategies for getting the metrics. Obtaining metrics from both applications and infrastructure involves instrumentation, data collection, aggregation, and analysis. Following are some of the a generalized approaches to getting metrics from each,

Getting metrics from applications:

Instrumentation, in your application code to emit metrics relevant to its performance, behavior, and business logic. Instrumentation can be done using the libraries or frameworks specific to the programming language or platform in your application stack.
Identify the types of metrics you need, such as counters, gauges, histograms, or summaries and based on the aspects of your application you want to monitor (e.g., request latency, error rates, throughput).
Use standardized formats like Prometheus exposition format, StatsD, or OpenTelemetry for emitting metrics, to ensures compatibility with a wide range of monitoring and observability tools.
Integrate metrics instrumentation with logging and distributed tracing frameworks to provide comprehensive observability across your application stack.
Choose between a push model where applications actively push metrics to a central metrics collection system or a pull model, a centralized monitoring system periodically pulls metrics from application endpoints.
Implement error handling mechanisms to gracefully handle failures in metric emission and ensure that errors don't impact application performance or stability.

Getting metrics from infrastructure:

Deploy monitoring agents on infrastructure components (e.g., servers, VMs, containers, orchestrators) to collect metrics related to resource utilization (CPU, memory, disk, network) system health and workload performance.
Define monitoring configurations alongside infrastructure definitions. Integrate metrics collection into your infrastructure provisioning and configuration management workflows using tools like Terraform, Ansible, or Chef.
Utilize APIs provided by cloud service providers (e.g., AWS CloudWatch, Azure Monitor, Google Cloud Monitoring) to retrieve metrics related to cloud resources, services, and platforms.
Leverage built-in tools and utilities to collect operating system-level metrics such as CPU usage, memory utilization, disk I/O, and network traffic.
Monitor network infrastructure components (e.g., routers, switches, load balancers) and collect metrics related to network throughput, latency, packet loss, and connectivity status.
Extract relevant metrics from log data using log aggregation and parsing tools. Some metrics, such as HTTP response codes or database query execution times, can be derived from log entries.

Design for Observability - Role of Metrics Ep 2

Swapnil K.

Helping Customer teams with Cloud Native Ops, Platform Engg, Solutions for Observability needs | Experience with Security, Compliance, Automation | Aspiring Fractional Exec

Getting metrics from infrastructure:

Recommended by LinkedIn

Cloud Native Hero!

1,674 follower

More articles by this author

Insights from the community

Others also viewed

Understanding Latency and Throughput

Disciplined system design for modern applications

System Design Blue Print

Rise Above, Rebound Strong: Trust in Self-Healing Architecture

Breakthrough: Artificial Enterprise System Architecture

Event-driven architectures vs event-sourcing patterns

The crucial role of observability in IT infrastructure

Kubernetes Custom Controllers part-1

Telemetry: Unlocking the Hidden Power of Observability in Axon Server Applications

Explore topics

Getting metrics from infrastructure:

Recommended by LinkedIn

Cloud Native Hero!

1,674 follower