The Challenges in Reliability Engineering
This image comes from Dictionary of French Architecture from 11th to 16th Century (1856) by Eugene Viollet-le-Duc (1814-1879).

The Challenges in Reliability Engineering

What are the Other Challenges in Reliability

Creating a product or system that lasts as long as expected, or longer, is a challenge.

It’s a common challenge that reliability engineering and entire engineering team face on a regular basis. It’s also not our only challenge.

We face and solve a myriad of technical, political, and engineering challenges. Some of our challenges are born and carried forward by our own industry. We have tools suitable for a given purpose altered to ‘fit’ another situation (inappropriately and creating misleading results). We have terms that we, and our peers, struggle to understand.

Sometimes, we, as reliability engineers have set up challenges that thwart our best efforts to make progress.

Let’s examine a few of the self made challenges and discuss ways to overcome these obstacles permitting us to tackle the real hurdles in our path.

MTBF and Prediction are The Two Big Issues

This site has the expressed goal to ‘eradicate MTBF’. It is the worst four letter acronym in our world. You already know this and so many of the readers here have taken steps to see this term relegated to the dust of forgotten history.

Parts count predictions, especially from our favorite military standard, is another widely known to be less than useful. Then why do we continue to find requirements to use this method as a basis to estimate actual future field failure rates?

Even 20 years after 217’s retirement/obsolesce it lives. Again, there are teams working on viable and actually useful alternatives. Physical of failure modeling, improved reliability modeling tools that permit (nay encourage) the use of appropriate life time distributions, and other work is slowly weening our industry from the folly of parts count predictions.

HALT: “let’s pass HALT”

This one isn’t discussed too often. Yet, have you heard someone wonder if their product could pass HALT?

How about, ‘of course it failed you were testing above the specified use level..’

HALT is the second worst four letter acronym.

We have a ways to go to make this basic concept clear. We are going to employ a process of stress to discover weaknesses in the design. We are going to use elevated stresses to discover problems and margins quicker.

Cost of Failure

Engineers know intuitively that failures are bad. The design effort includes actions to design a robust and reliable product.

One tool that we often avoid employing is the actual or estimated cost of a failure. We tend to focus on failure rates and failure mechanisms, which is fine to a point. Yet, if we do not also include the consequence (safety, warranty, brand loyalty, customer losses, etc.) we only enjoy half the information we need to enable great decisions.

Our team needs to work on the potential and actual failures that make a difference when solved. Not all failure modes are the same. Let’s solve the ones that save the most lives, anguish, and money.

Get the information you need for your product to determine the cost per failure. This information along with an expected shipping volume and estimated failures rates enables the calculation of the cost of failure.

If you calculate the cost of failure per unit shipped, you have a value that is comparable to the bill of material cost of the materials and components in a product. In my experience, the cost of failure per unit shipped is the most expensive or within the top 5 most expensive components in a product.

We employ teams of engineers to develop a single critical component, to cost reduce an expensive component, and our ignorance allows wonderful opportunities for savings to remain hidden.

Determine the cost of failure and make that information widely available to your team. Show them how to use the information to weigh the everyday decision they make during design and development.

Mixed Priorities

I’ve been told product reliability is critical than asked to use less than half the sample size necessary for an accelerated life test.

Critical, important, and top priority are great terms. They sound great. If they do not come with resources, personnel, budgets, and support, those terms are hollow platitudes suggesting our work on reliability is critical, important, or a top priority.

I’m not suggesting, although often really do believe, reliability performance is a top priority. Organizations have many priorities and I get that. The challenge is in the mixed signals. The unclear priorities. The many top priorities.

The remedy is to quantify the cost of failure again. Management, mostly, talks in terms of money. So, we need to convert a 1% failure rate into dollars lost to warranty per year. We need to quantify the cost of uncertainty, especially when the uncertainly ranges from none to billions in potential losses. A 10% chance that we have a major safety issue for a $100 million product line suggests the likely loss is $10 million unless we reduce the risk. Few other product risks involve such threats to profit and business viability.

Part of why reliability isn’t well positioned in the pantheon of priorities is it is difficult to quantify. At least that is my observation. Difficult doesn’t mean impossible.

Reliability is one of most organizations' set of priorities to get right. Let’s help our teams align the ability to deliver the expected reliability to achieve the goals, while properly balancing with other priorities.

Summary

There are challenges in the world of reliability engineering. MTBF and predictions are well known and many are working to help us and our peers move forward.

HALT, Cost of Failure, and Mixed Priorities are 3 of the many challenges you face on a regular basis. What would you add to this list? How can we, as a community of reliability engineers do to solve them? Add you suggestions and recommendations in the comments section below.


Fred Schenkelberg is an experienced reliability engineering and management consultant with his firm FMS Reliability. His passion is working with teams to create cost-effective reliability programs that solve problems, create durable and reliable products, increase customer satisfaction, and reduce warranty costs. If you enjoyed this article consider subscribing to the ongoing series at Accendo Reliability.


Ken Neubeck

Reliability Engineer at North Atlantic Industries Inc

2y

I apologize for not finding this article sooner. I am a long time reliability engineer for over 45 years, having performed this function for four major companies - three aerospace and one commercial. You are absolutely correct about reliability predictions and the sad persistence of industry to still want to use MIL-HDBK-217 to generate reliability predictions. The complex ICs that are being manufactured these days such as SOCs and FPGAs are way beyond the scope of the models in the handbook. (FPGAs typically have 50 million devices, currently). Fortunately, many of the major manufacturers have been providing manufacturing test data that has been a decent source for calculating failure rate....empirical data always trumps models. The different US services do not get along to the point that a revised handbook can ever be made, so it make sense to use manufacturing data for ICs and MOSFETs, and still use the handbook for passive components. Ken Neubeck

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics