Tips for using mainframe AIOps to stay on top of the digital-first wave

Tips for using mainframe AIOps to stay on top of the digital-first wave

This article is authored by Greg Lotko, Senior Vice President and General Manager, Mainframe Software Division

Digital transformation is not a new concept. The evidence is all around us. It’s more common to do things online than it was just a few years ago. 2020 marked a dramatic increase in consumption of digital services as people moved online out of necessity. But they remained for convenience and, today, activities such as shopping, learning, communicating and working are often done via apps rather than in person. The rate of transformation is rapidly accelerating and shows no signs of slowing down.

Many organizations are looking to use AI to identify patterns in the data generated from these online activities so they can anticipate customer needs and gain a competitive advantage. They are also fusing AI with IT operations, AIOps, to help them keep the systems that support their online offerings healthy and resilient.

It’s important to view AIOps as a journey of continuous improvement, one focused on providing Ops teams with the tools they need to drive operational resilience and keep business apps performing as expected. Here are a few thoughts that can help you begin or progress on your own AIOps journey.

AIOps and the mainframe

AIOps is being incorporated into every system that supports digital transformation, and that includes the mainframe. Mainframes play a central role in many aspects of our always-on, digital lives and work. They house vast volumes of consumer data — demographics, preferences, behaviors, buying patterns and more — all of which are critical to delivering a superior online experience.

Every time an online service touches the mainframe, health-tracking telemetry data is spun off. The unprecedented expansion of always-on apps has led to a commensurate growth in telemetry data, which must be monitored and managed proactively. The sheer volume of this data makes it difficult for humans to keep up, and leading organizations are turning to Machine Learning (ML), a subset of AI, for help.

According to feedback from our customers, a significant portion of telemetry data is deemed non-actionable and is commonly referred to as "noise." At Broadcom, we use AI and ML to reduce the complexity of mainframe data streams and decrease the noise so our customers can prioritize their operations resources on tasks that matter most to their businesses. Our software models mainframe systems, analyzing millions of change events, logs, metrics, and workload trends to learn what conditions led to issues in the past and give warning should similar conditions arise in the future. Helping our customers understand patterns and declutter alerts allows them to anticipate failures based on past patterns, prevent small issues from becoming outages, and shift from reactive to proactive operations.

For example, with the help of our software, a Fortune 500 financial services company identified a precursor event known to have previously led to an outage. They were able to identify the anomaly two weeks before a routinely scheduled review would have signaled that anything was amiss, leaving plenty of time to prevent the outage from occurring.

Work with the whole — not the pieces

There are a few considerations to keep in mind when incorporating AI/ML into IT Ops. First, operations are often partitioned into functional teams who use their own tools and data collectors. This results in siloed data and incomplete views of system health across the organization. We've built infrastructure health dashboards that integrate traditional mainframe telemetry with AI/ML-enabled insights. These dashboards are customizable, enabling employees of multiple skill levels to collaborate using the same tools.

We're expanding this concept into application-level health views that would allow mainframe IT Ops to align more closely with business applications teams. Linking infrastructure with the applications it serves will increase operational efficiency by allowing customers to prioritize resolution efforts based on business impact, and contact the right teams from the start.

This only works if everything is open

AIOps relies on breadth and depth of data to produce optimal outcomes. This is especially true in hybrid cloud environments where data comes from multiple sources. No single solution can satisfy every requirement – but solutions can work together so that customers retain the flexibility to use their tools of choice.

The key is to integrate the data across toolsets. AIOps solutions must adopt an open-first approach, exposing data through open APIs to ensure it can be consumed and analyzed by third-party tools. This approach makes mainframes more observable to operations staff responsible for applications that span multiple computing domains in the hybrid cloud.

AI is not about technology — it’s about people

AIOps is more than technical talk; it's about providing IT personnel with the necessary tools to keep the world running. Humans alone can’t handle the speed and amount of information that needs to be processed, but AI systems can’t replace human experience and know-how, especially with new and unique problems that would be difficult to detect without retraining the AI models. AIOps should be viewed as assistive technology for operations that helps IT staff achieve outcomes. People play the most crucial role – AI is an enabler that helps them prioritize resources, anticipate problems, automate processes, analyze data faster, and deliver innovative user experiences.

Let’s get started now

AIOps is a journey of continuous improvement. Adaptive systems are not yet mature enough for wholesale adoption, so focus on small changes that can lead to significant improvements over time. By making essential gains in the short term, you can achieve sustainable benefits in the long run and transform your business using AIOps. If you haven’t done so already, get started now!


To view or add a comment, sign in

More articles by Broadcom

Insights from the community

Others also viewed

Explore topics