Can ASO be used to aid Fault Detection and Diagnostics (FDD)?
I recently read an interesting article by LBNL titled, "Transforming Building Operations with New Self-Correcting Controls Technology."
Later in the article, the author presents several ideas for Automated Supervisory Optimization (ASO), which can be used to optimize HVAC systems at the supervisory level, primarily targeting energy efficiency measures. One bullet point at the end particularly caught my attention: "Conduct automated functional testing (commissioning) with root cause failure analysis."
The key points highlighted were:
- Correct zone temperature setpoints to operational intent
- Correct programmed schedules to operational intent
- Optimize economizer high-limit lockout temperature setpoint
- Release unnecessary control overrides
- Correct biased temperature sensors
- Implement best practice AHU static pressure setpoint reset
- Implement best practice AHU supply air temperature setpoint reset
- Identify and suppress rogue zones in AHU reset strategies
- Correct control hunting (automated PID loop tuning)
- Implement best practice demand flexibility
- Conduct automated functional testing (commissioning) with root cause failure analysis
While most of these practices are standard in the industry, the idea of conducting automated functional testing or commissioning with root cause analysis really stood out to me. How neat would it be to automatically commission an HVAC system while performing a detailed root cause failure analysis?
What is this?
This instantly took me back a few years, long before I started programming in Python or even knew what IoT was. At that time, I was working for an engineering consulting firm alongside Adam Mangrich, PE, CCP on federally funded Retro-commissioning (RCx) studies. Adam, the owner of an HVAC design firm based in Oregon, is a seasoned commissioning agent, a Certified Commissioning Professional (CCP), a licensed Professional Engineer (PE), and a past president of the Building Commissioning Certification Board (BCCB). In other words, Adam knows his stuff, and I learned a ton from him.
We spent a significant amount of time in the field, recommissioning old GSA facilities like Veterans Affairs hospitals and federal laboratories. RCx studies assess how a building operates over time, and can involve energy audits, recalibrating outdated control systems, updating ventilation design to meet current code requirements, conducting testing, adjusting and balancing (TAB), and even performing building energy simulations. One RCx project I vividly remember even involved bringing CAD drafters into the field to recreate chilled water system piping records. Over time, previous facilities management regimes had lost the original documentation, and for a very large hospital, this can turn into a real nightmare.
Typically, these projects are conducted on equipment that is 10+ years old, where there are no immediate plans to update the building, but the goal is to fix whatever is currently in place. Just like a control system startup in contracting, for a system that hasn’t been run yet, Adam and I would systematically approach an old air handling unit (AHU) system. With a process in place, especially in a building that might have 50 AHUs, we could refine our methods over time.
In large GSA or VA hospitals, this would keep us occupied for 2-4 weeks, working full-time in the field to validate whether those old control system components still functioned. The process usually unfolded as follows, and when Wi-Fi was available in mechanical rooms, it became much easier to conduct tests directly in front of the equipment. Otherwise, we would often resort to handheld radios for communication if Wi-Fi was unavailable.
In large facilities with 50 to 100+ AHUs, this process moves quickly. When we approached an AHU operating in a live hospital setting, after sending out all the necessary emails to inform building occupants of potential slight temperature deviations, we would begin by putting the AHU in full recirculation mode. This involved commanding the control system to close off all heating and cooling valves, allowing the unit to operate by simply recirculating air for about 20 minutes.
During this time, all temperature sensors should theoretically equalize, since no cold or hot outside air is being mixed into the return air, and no heating or cooling is being provided via the AHU coils. This full recirculation mode serves to protect the equipment. The only downside is the temporary shutdown of ventilation to the building, but for commissioning and testing purposes, this is completely acceptable.
After this process, once the old AHU had run in recirculation mode for a while, we would check whether the supply air, mixing air, and return air temperature sensors had generally reached an equalized state. The only expected variation would be a slight rise in supply air temperature—typically around 2°F—due to fan heat, which is common in VAV AHU systems.
For example, in cold weather, the sensor values for an AHU might show that the temperatures are not fully equalized. During winter, the return and outdoor air streams mix for ventilation, causing the mixing air temperature sensor to drop. In some cases, the mixing air may need to be reheated with a heating coil to bring the temperature back to a balanced state.
In the summertime, the opposite scenario would occur. Hot outside air mixes with the return air streams, and mechanical cooling becomes necessary to cool the air. Additionally, if the building is in a humid climate, dehumidification might also be required to properly condition the air for occupants. This process ensures that the ventilation air provided for breathing is adequately cooled and dehumidified for comfort and air quality.
For the recirculation air mode during the testing process, it’s essential for the temperature sensors to equalize. This occurs when all the air dampers are shut, as well as the heating and cooling coil valves, creating a controlled environment for testing.
Once again, this is an important step to take even before performing a point-to-point check. If the temperature sensors don’t equalize—quite common in old GSA buildings or ancient VA hospitals—this indicates potential issues. For example, air dampers have seals, and after a decade of operation, it’s not unusual for the seals to degrade or for the dampers to no longer fully open or close. It’s quite common to find the seals lying in a pile at the bottom of the AHU mixing chamber.
In less common scenarios, I’ve seen issues in a VA hospital that used to have its own chiller plant for mechanical cooling. Over the decades, this plant was removed, and the hospital was tied into a campus-wide chilled water network. According to the building operators, this caused numerous problems, not just with water pressure issues needed to deliver chilled water effectively, but also with contamination. The campus chilled water system introduced so much contamination that the water became dark or black, often containing sand or other particulate matter.
In short, this sand can wreak havoc on pump impellers and valve bodies, essentially acting like sandpaper. With the high pressures from the campus pumping system, old valve bodies can begin to leak. This leakage is often revealed during the recirculation mode test, where temperature sensors show temperatures far too cool due to a leaking chilled water valve!
It's important to emphasize that this step should be completed before even starting a thorough point-to-point calibration for the AHU. Once the AHU has been running in recirculation mode for 20 minutes, the process begins by inserting a third-party, factory-calibrated temperature sensor into the supply air duct to compare its measurements against the BAS sensor readings. After that, the steps following the recirculation air test involve:
- Shutting the equipment off and validating that the motors actually turn off
- Checking the VFD minimum speed setting
- Exercising the air dampers with the equipment off through visual verification
Recommended by LinkedIn
- Testing heating and cooling valves
- Ensuring the equipment starts up properly and controls to a set duct static pressure
All of these steps are meticulously documented, comparing BAS commands and readings to actual field checks observed by human verification.
How does this apply to ASO and anything in the smart building IoT world?
Referring back to the original LBNL article that mentioned, "Conduct automated functional testing (commissioning) with root cause failure analysis," if I were setting up Automated Supervisory Optimization (ASO) on a VAV AHU system, this process could easily be automated through BAS programming and IoT BACnet stack scripting in an edge environment.
The process is very similar to how humans perform RCx in the field, but it’s fully automated. Instead of humans visually checking the temperature sensors on the BAS, fault rules are applied to the data being captured. In the modern smart building IoT world, data is continuously being collected, so why not write some ASO code to help capture high-quality data for Fault Detection and Diagnostics (FDD) purposes?
How often should this test be conducted?
In RCx, if a building is fortunate enough to receive it, the process typically happens only once every 5 to 10 years. However, with a modern IoT infrastructure and continuous data ingestion from the building, this process could, theoretically, occur much more frequently—perhaps even once a week. What are your thoughts on this it would be a nice debate...?
This would require strong collaboration and communication with the building engineers and operators. During the setup of ASO, they would need to be heavily involved as the "eyes" in the field for the ASO programmer, ensuring that the automated tests perform specific tasks (e.g., X, Y, Z) on the AHU. The success of such an approach hinges on the maintenance team’s buy-in, helping them understand the value of these tests and scheduling them at a convenient time.
For instance, an automated recirculation mode test could be scheduled every Tuesday at noon for 30 minutes (adjustable) to capture high-quality data for FDD. Tuesday noon might be a good time because the Monday morning rush has typically settled down, and noon is well past the building's morning warm-up or cool-down period. Additionally, the team might be on a lunch or coffee break, making it easy for them to glance at the BAS system, especially if there’s a large display in the facilities management area. In many cases, there’s a dispatch person handling work orders, and these flat-screen monitors often display real-time OT data, allowing for quick checks.
FDD fault rules?
In a modern smart building IoT platform, we should be running fault detection rules to identify mechanical issues, and an ASO program would only enhance this process by aiding in capturing high-quality data. For example, if we were running ASHRAE G36 fault rules, simply putting the AHU into full recirculation mode would help satisfy the rule conditions for an AHU. With a 1-minute data capture interval, as recommended by G36, along with the 5-minute rolling average features, we'd be able to meet these requirements during the ASO sequence.
This is just one recommendation for writing purposes—any FDD ruleset would likely accomplish similar checks and yield a root cause analysis in the rule itself. However, with ASHRAE's ruleset, we would be satisfying the following fault rule checks:
- Fault Condition 1: Duct static pressure too low with fan operating near 100% speed
- Fault Condition 2: Mix temperature too low; should be between outside and return air
- Fault Condition 3: Mix temperature too high; should be between outside and return air
- Fault Condition 5: Supply air temperature too low; should be higher than mix air
- Fault Condition 14: Temperature drop across inactive cooling coil (requires coil leaving temperature sensor)
- Fault Condition 15: Temperature rise across inactive heating coil (requires coil leaving temperature sensor)
There could even be some additional check like Adam, and I did with the return air temperature sensors equalizing but by running these tests in recirculation mode, we would not only ensure that the AHU is functioning properly, but we would also be gathering the necessary data to meet ASHRAE's fault rule checks efficiently.
Getting fancy
If we really wanted to get sophisticated, we could incorporate an additional step into this ASO sequence by first reading the outside air temperature. If the temperature indicates it's spring or fall and good free cooling conditions are available, why not run the test in 100% outside air mode for 20 minutes? This would allow us to make the most of free cooling opportunities and still satisfy the ASHRAE G36 fault detection equations, if G36 was in use.
For example, we could satisfy the following G36 fault conditions:
- Fault Condition 8: Supply air temperature and mix air temperature should be approximately equal in economizer mode
- Fault Condition 9: Outside air temperature too high for free cooling without additional mechanical cooling in economizer mode
- Fault Condition 10: Outdoor air temperature and mix air temperature should be approximately equal in economizer plus mechanical cooling mode
- Fault Condition 11: Outside air temperature too low for 100% outdoor air cooling in economizer mode
- Fault Condition 12: Supply air temperature too high; should be lower than mix air temperature in economizer plus mechanical cooling mode
- Fault Condition 13: Supply air temperature too high in full cooling during economizer plus mechanical cooling mode
By leveraging this strategy, we could optimize the AHU’s performance and gather valuable data for fault detection, especially during times when outdoor air conditions are ideal for economizer cooling. This fancy ASO sequence could look like this below.
Conclusion
Can ASO be used to aid Fault Detection and Diagnostics (FDD)? I believe so. As I’ve demonstrated with ASHRAE G36 fault detection equations, any FDD platform can utilize similar checks, as most already monitor for sensor mismatches, leaking air dampers, and valve issues. The key IMHO is using ASO to help ensure that FDD receives high-quality data, which is crucial for running these equations effectively.
To achieve the best results, it’s essential to have the maintenance team highly involved. Ideally, ASO could even become part of their preventive maintenance (PM) program. If the team is available and not tied up with emergency work orders, they could visually observe the ASO in action. For example, when the ASO runs on a set schedule—say, every Tuesday at noon—maintenance staff could be right next to the AHU during the test just like Adam and I in the RCx processes. They could pop open a door on the AHU and visually verify that dampers are closing or, in economizer mode, that they are fully opening as expected.
One of my concerns with FDD is the lack of human eyes on HVAC components during testing, but integrating ASO into a PM program like this would resolve that. Additionally, the maintenance team could invest in a high-quality third-party temperature sensor validation tool to further enhance the testing process. By comparing AHU return, mixing, and supply air readings with a calibrated tool, they can ensure more accurate results.
In my experience, high-quality calibration tools are essential where it seems like the engineering consulting purchases with no questions asked. For instance, Adam used a factory-calibrated tool from Vaisala, known for its accuracy and reliability. I strongly recommend using similar high-quality instruments, as too often in controls contracting and facilities management, low-quality cheap tools (I.E., Playschool level quality) are often purchased which can lead to poor results, especially when compared to control system sensor readings. Ensuring quality tools for validation is critical to the success of ASO and FDD alike.
What are your thoughts? Is the smart building IoT industry already implementing these practices, or is there room for improvement? I'd love to hear your insights—please share them in the comments below!
Principal Analyst at PriPro Associates
2moBen Bartling great case for integrating ASO (BTM) with grid responsive AI (FOM) for predictive load-shifting, operational optimization. Michael Day Nick Gayeski Andrew Rodgers Ben Ealey