Demystifying ChatGPT upload image features

Demystifying ChatGPT upload image features

The new features of ChatGPT for analyzing images are intriguing, particularly in understanding BAS graphics set up by controls contractors. These graphics contain organized data aimed at human building operators. How can LLMs further assist in this context?

Building operators manage diverse systems like HVAC, lighting, security, and more, often using GUI interfaces. LLMs like ChatGPT could monitor system statuses, interpret alerts, and suggest maintenance actions, providing real-time insights and predicting potential failures. This support could enhance the efficiency and reliability of facility management operations, making daily tasks easier for building operators.

Optical Character Recognition (OCR)

When you upload an image to ChatGPT on the newer 4o and ChatGPT-4 models, behind the scenes, computer vision processes recognize that it is an image from a BAS GUI or graphics. These processes identify the type of HVAC system and utilize OCR (Optical Character Recognition) to interpret any text present in the image.

Python has a variety of tools for demonstrate this one of which is wrapped around a tool Tesseract which apparently has been around for a long time and now maintained by Google. A high level understanding it is a computer vision tool that can extract numbers and letters from images in a variety of different languages. Ill demonstrate the following with a little bit of Python code that has pytesseract installed and the open computer vision library called cv2.

import matplotlib.pyplot as plt
import cv2


# Function to display the image
def display_image(image):
    image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    plt.figure(figsize=(10, 10))
    plt.imshow(image_rgb)
    plt.title('Image')
    plt.axis('off')
    plt.show()

# Read the image using OpenCV
image = cv2.imread(ahu)

display_image(image)        

This code snippet above it will read in my BAS screenshot of the air handling unit (AHU) displayed below and display it which I run in an IPython notebook.

BAS screenshot rendered by Python and computer vision

I can then use pytesseract to extract the text from the image as follows...

# Function to extract text from an image
def image_to_text(image_path):
    image = cv2.imread(image_path)
    text = pytesseract.image_to_string(image)
    return text

# Extract text from the image
extracted_text_ahu = image_to_text(ahu)        

Where then if you print the extracted_text_ahu it will display a bunch of random text or stings which was all organized nicely by the BAS technician that built the user interface.

AHU_:

Scheduled
‘Motion Override
Unoccupied Heat
Unoccupied Cool
Fire Command

Economizer
Chilled Water

High Static
Freeze Stat
‘Smoke Alarm
onization Alarm

Discharge Setpoint
Cool Request
Increase Setpoint
Decrease Setpoint

Static Setpoint

Damper Request
Increase Setpoint
Decrease Setpoint

Heating Request

sa PID
‘Ma PID

Outside Air Temp

2
a

Off
OFF
OFF
Off

Disabled
‘Cable

Normal
Normal
Normal
Normal

50.0 °F
6
Off
on

0.80 in/we
4
Off
Off

a

55%
144%

78.3 °F

Space Static
Relief Damper _Relief Fan 0.008 in/we
om Of Return Air
4% onz
75.0°F
Normal Normal

Final Filter
0.2 in/we
Clean

supply Fan

GME) iscnarge air
41% 50.1 °F
24 Hz 0.79 in/we
Normal 2592 cfm

J

OA Damper N.O. Outside Air Chilled Water
MixedAir Temp
0% 949 cfm 10%
69.4 °F
1% 11%
Normal Normal
nan %

ERV-2
2365 cfm
51.9 °F

ERV-2        

To a human observer, this might seem random and disorganized, but how does it appear to ChatGPT, which has a strong understanding of text?

It's worth noting that in quick research or engineering projects, engineers are sometimes asked to analyze a mechanical system based solely on BAS screenshots. This approach may seem less thorough, but one can glean a lot of information from these screenshots. For instance, the type of HVAC system, general operations such as how the AHU (Air Handling Unit) is functioning, and the operating loads on the system. BAS graphics typically display a wealth of information, including outside air temperature, faults, alarms, and sensor IO command telemetry. In other words, there is a lot of data on the BAS screenshot, but it fails to represent historical data which is very useful to demonstrate how an AHU has operated over time.

The human engineer by looking at the BAS screenshot would write their report as such based on the screenshot something along the lines...

The AHU is currently operating in cooling mode with minimal outside air supplied by an energy recovery unit (ERV), maintaining a discharge air temperature setpoint of 50°F and a supply duct static pressure setpoint of 0.8" WC. The supply fan is modulating at 41% speed to achieve 0.79" WC of duct static pressure, with the supply air temperature measured at 50.1°F. Airflow stations indicate 949 CFM of outside air and 2365 CFM at 51.9°F from the ERV, alongside a return air temperature of 75°F. Relief fans are inactive, and a slightly negative space static pressure suggests a possible negative pressure build-up, if accurate. Airflow measurements from the ERV and supply fan indicate minimal to no return air volume, prompting consideration of whether the AHU is designed for the ERV to exclusively handle return air in non-economizer mode. Feedback indicates effective operation without mechanical faults. Overall, the AHU appears to be performing well. It may be beneficial to implement resets for temperature and duct static pressure setpoints to enhance energy savings. Additionally, the necessity of the 50°F supply temperature setpoint could be evaluated further. Typically, systems are designed with a 55°F temperature setpoint for cooling or dehumidification modes, suggesting that the lower setting may require investigation into potential design overrides or adjustments.

I tested this with ChatGPT-4, providing a detailed prompt to get feedback in a specific format. The information I provided was similar to the format used above, allowing the LLM to determine how the AHU is operating. This context helps the LLM understand what it is analyzing and generate a response. This process is akin to existing building commissioning or retro-commissioning (RCx). The LLM must fill in the blanks where indicated by [bracket].

The following data is extracted with computer vision from a BAS graphic of a VAV AHU. Please provide feedback to the human building operator in the following format:

The AHU is currently operating in [mode of operation: heating, or cooling] with [details about outside air temperature conditions if applicable to indicate load on the system or building], and maintaining [specific setpoints or operational parameters for supply duct static pressure or temperature]. The supply fan is [operating at % speed command], achieving [achieved duct static pressure], with the supply air temperature measured at [temperature if applicable]. Relief or return air fans are [status if applicable to this type of HVAC system], and [additional details about system operation]. Airflow measurements from [specific components if system has air flow measuring stations] indicate [observations about airflow direction or volume]. Feedback indicates [assessment of system operation, e.g., effectiveness or issues].

Overall, the AHU appears to be [general assessment of performance and detection of faults like broken valves or dampers for energy inefficiency or sensor errors]. It may be beneficial to [recommendation for improvement, e.g., operational adjustments or maintenance]. Additionally, [consideration of specific operational parameters or setpoints] could be evaluated further especially anything related to ventilation requirements for commercial building and energy efficiency.        

The ChatGPT-4 model responds with the extracted text data, extracted_text_ahu. While the format is generally acceptable, it still feels a bit "off" because the text data seems somewhat unorganized. However, we can further improve this below.

The AHU is currently operating in cooling mode with an outside air temperature of 50.0 °F, indicating that there may be a lower load on the system due to cooler external conditions. The system is maintaining a discharge air temperature setpoint of 50.1 °F and a static pressure setpoint of 0.80 in/wg.

The supply fan is operating at 41% speed command, achieving a duct static pressure of 0.79 in/wg, with the supply air temperature measured at 50.1 °F. The relief or return air fans are operating normally with a static pressure of 0.008 in/wg and a return air temperature of 75.0 °F. The airflow measurements from the system indicate a supply airflow of 2592 cfm, with outside air dampers at 0% and chilled water flow at 10%.

Feedback indicates that the AHU appears to be operating effectively, but there are a few observations and recommendations:

Airflow and Temperature Consistency: The mixed air temperature is 69.4 °F, and the return air temperature is 75.0 °F. There is a significant difference between these temperatures, suggesting potential inefficiencies or areas for investigation in the mixing process.

Damper Position: The outside air damper is at 0%, which is appropriate given the low outside air temperature. However, this should be periodically checked to ensure proper operation and modulation.

Fan Speeds and Static Pressure: The supply fan is at 41% speed while maintaining the desired static pressure. This suggests the system is operating efficiently, but it is always beneficial to periodically review the static pressure setpoint and fan speed to ensure optimal energy use.

Overall, the AHU appears to be performing well without immediate signs of faults like broken valves or dampers. It may be beneficial to periodically check the mixing process and damper operations to ensure energy efficiency and optimal performance. Additionally, evaluating ventilation requirements and ensuring that the system meets commercial building standards for air quality and energy efficiency is advisable.        

In the world of existing building commissioning, this technology is powerful for consulting engineers who are often tasked with writing reports on HVAC operations. However, it does lack context regarding how the HVAC system operates over time. The response is generally good, but it fails to consider that the outdoor air temperature is nearly 80°F, which is not ideal for economizing or mixing. The LLM suggests that mixing air temperatures could be optimized, but at temperatures above the ideal range for free outdoor air economizing, the AHU would typically operate in a minimum outside air mode, not optimizing for mixing. This part seems odd. Can we improve by better organizing the text data? Let's at least try. If anyone else see's anything post a comment...

This involves quite a bit of code, for which I apologize, but the key point is that the Tesseract tool and computer vision techniques can yield coordinates in the X and Y directions, as well as the height and width in pixels of an image. This allows us to better organize the text data that we, as humans, visually interpret from the BAS graphic, which contains a lot of information. By using the Pandas computing library in Python, we can attempt to organize the data more effectively and also display the image again to verify that the data is being rendered correctly, as it would be interpreted by a human from a BAS graphic.

import pandas as pd

# Function to preprocess the image
def preprocess_image(image_path):
    image = cv2.imread(image_path)
    # Convert to grayscale
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    # Apply binary thresholding
    _, thresh = cv2.threshold(gray, 150, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
    return thresh, image

# Function to extract text and coordinates from an image
def image_to_text_with_coordinates(image_path):
    preprocessed_image, original_image = preprocess_image(image_path)
    data = pytesseract.image_to_data(preprocessed_image, output_type=pytesseract.Output.DICT)
    text_with_coords = []
    n_boxes = len(data['text'])
    
    for i in range(n_boxes):
        if int(data['conf'][i]) > 0:  # Filter out low-confidence results
            text = data['text'][i]
            x, y, w, h = data['left'][i], data['top'][i], data['width'][i], data['height'][i]
            text_with_coords.append((text, (x, y, w, h)))
    
    return text_with_coords, original_image, preprocessed_image

# Function to display the image with annotated text
def display_image_with_annotations(image, annotations):
    for text, (x, y, w, h) in annotations:
        image = cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)
        image = cv2.putText(image, text, (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2, cv2.LINE_AA)
    
    plt.figure(figsize=(10, 10))
    plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
    plt.title('Image with Annotations')
    plt.axis('off')
    plt.show()

# Function to display the thresholded image
def display_thresholded_image(thresh_image):
    plt.figure(figsize=(10, 10))
    plt.imshow(thresh_image, cmap='gray')
    plt.title('Thresholded Image')
    plt.axis('off')
    plt.show()

# Group text elements by proximity
def group_text_elements(text_with_coords, threshold=50):
    groups = []
    current_group = [text_with_coords[0]]
    
    for text, coords in text_with_coords[1:]:
        last_coords = current_group[-1][1]
        if abs(coords[1] - last_coords[1]) <= threshold:
            current_group.append((text, coords))
        else:
            groups.append(current_group)
            current_group = [(text, coords)]
    groups.append(current_group)
    return groups

# Sort text elements within each group
def sort_within_groups(groups):
    sorted_text_with_coords = []
    for group in groups:
        group.sort(key=lambda x: (x[1][1], x[1][0]))
        sorted_text_with_coords.extend(group)
    return sorted_text_with_coords

# Function to form a sentence from text elements
def form_sentence_from_text(text_with_coords):
    sentence = " ".join([text for text, _ in text_with_coords])
    return sentence

extracted_text_with_coords, original_image, thresh_image = image_to_text_with_coordinates(ahu_copy_2)

# Group and sort text elements
groups = group_text_elements(extracted_text_with_coords)
sorted_text_with_coords = sort_within_groups(groups)

# Display the image with annotated text
display_image_with_annotations(original_image, sorted_text_with_coords)

# Display the thresholded image
display_thresholded_image(thresh_image)        

When it comes to printing sorted_text_with_coords in Python, it might not look much different to a human compared to the previous version. However, for the LLM, it provides the AHU data in a structured format, which I believe it can interpret better. Here is the response when pasting it into the Chat Bot! I also apologize for the formatting that which I cannot replicate on this LinkedIn article, but it does come through in nice bullet points from ChatGPT.

The AHU is currently operating in cooling mode with an outside air temperature of 50.0°F, indicating a moderate load on the system. The AHU is maintaining the following specific setpoints and operational parameters:

Supply fan is operating at 78.3% speed command, achieving a duct static pressure of 0.79 inches of water column.
Supply air temperature is measured at 50.1°F.
Relief fan is off, and the relief damper is at 0% open, while the outside air damper is 1% open, showing normal operation.
The economizer is disabled.
Chilled water valve is at 10% open, and the mixed air temperature is 69.4°F.
Return air temperature is 75.0°F.
Space static pressure is -0.008 inches of water column.
Airflow measurements indicate the following:
Supply airflow is 2592 CFM.
ERV-2 supply airflow is 2365 CFM.
ERV-2 outside air temperature is 51.9°F.
Feedback indicates normal system operation with no apparent faults. The chilled water valve and mixed air temperatures suggest that the system is adequately cooling the space.

Overall, the AHU appears to be performing well without any obvious issues such as broken valves or dampers. It may be beneficial to continue monitoring the economizer operation and ensure it is enabled when outside conditions permit to improve energy efficiency. Additionally, evaluating specific operational parameters or setpoints related to ventilation requirements for the commercial building and energy efficiency could be valuable.        

Conclusion

I personally think this is pretty cool (full blown ChatGPT is mesmerizing...) not only for a consulting engineer but also for a building operator. Imagine a building operator on the BAS who may need assistance in interpreting a complex BAS graphic; if this tool were built into a cloud platform or integrated directly into the BAS itself, it could help perhaps an untrained building operator operate the building more effectively in terms of energy efficiency and the quality of the air that people breathe inside the building, all from text data extracted from a BAS graphic.

Ultimately, ChatGPT's image-uploading feature is very advanced based on my testing. Using it for HVAC purposes is likely just a small example of what LLM developers have trained the model to know and classify.

Why Stop Here?

I have experimented with screenshotting BAS Graphics on an interval and a prompt template something along the lines...

I am going to start watching the equipment on a 10 minute interval which BAS screenshots shots should I go to first. Ill look to you on guidance what I should be doing on the BAS but I have read only access and can inform the maintenance manager to schedule a work order if something looks off.

And the LLM will respond with excitement that I can gather...

Certainly! To effectively monitor the equipment on a 10-minute interval, you should prioritize viewing BAS screenshots that provide the most critical and comprehensive information about the HVAC system's status and performance. Here’s a step-by-step guide on which screenshots to check first and what to look for:

RTU (Rooftop Unit):

Check: Supply fan status, outdoor air damper position, discharge air temperature, economizer lockout setpoint, and supply fan speed.
Why: The RTU often controls the air supplied to multiple zones, and any issues here can affect the entire system. Ensure the economizer is functioning correctly and free cooling is enabled when appropriate.
Action: Look for anomalies like unexpected supply fan statuses (e.g., should be on but is off), incorrect damper positions, or discharge air temperatures not matching setpoints.
Boiler System:

Check: Boiler command statuses, HWR temperature setpoint, and pump statuses.
Why: Boilers are crucial for heating, and any malfunction can lead to comfort issues or energy wastage. Monitor temperatures and boiler operations to ensure they are within expected ranges.
Action: Check if the boilers are running when they should be and if the temperatures are within the setpoints. Report any discrepancies.
First Floor Space Temperatures:

Check: Individual room temperatures, CO2 levels, and any anomalies in the temperature readings.
Why: Maintaining appropriate temperatures and CO2 levels is essential for comfort and air quality. Anomalies could indicate problems with the VAVs or thermostats.
Action: Identify rooms that are significantly deviating from the setpoint. Inform the maintenance manager if specific zones are consistently too hot, too cold, or have high CO2 levels.
Second Floor Space Temperatures:

Check: Same as the first floor, focusing on individual room temperatures and any anomalies.
Why: Ensuring consistency in comfort and air quality across all floors is essential. Temperature variations can indicate VAV or system-level issues.
Action: Same as the first floor, ensure all rooms are within acceptable temperature ranges and report any consistent deviations.
VAV Boxes:

Check: Air valve positions, heat valve positions, space temperature setpoints, and actual space temperatures.
Why: VAVs directly control the airflow and temperature of individual zones. Checking their status helps ensure each zone is receiving the correct amount of air and heating/cooling.
Action: Identify any VAVs that are not meeting setpoints or have unusual valve positions (e.g., air valve at 100% with no corresponding demand). Report any issues.
Monitoring Schedule:

First Interval (00:00 - 00:10):

Check the RTU and Boiler System screenshots first. These provide a broad view of the system's operational status and are critical for overall system performance.
Second Interval (00:10 - 00:20):

Check the First Floor Space Temperatures. This ensures that you catch any immediate issues in a significant portion of the building.
Third Interval (00:20 - 00:30):

Check the Second Floor Space Temperatures. This allows you to monitor another large area of the building for any problems.
Fourth Interval (00:30 - 00:40):

Check the VAV Boxes (focus on a few each interval, rotating through them systematically). Start with VAVs that previously showed issues or are critical zones (e.g., server rooms, high-occupancy areas).
Repeat the Cycle:

Continue to cycle through these screenshots in subsequent intervals, ensuring you cover all areas regularly.        

Why not leverage Python's capabilities to sift through BAS graphics, which contain a lot of organized data, allowing an LLM to monitor the BAS directly? Perhaps could the LLM watch for alarms, silence them when appropriate, and even notify human operators with instructions like "Go check something!" I believe it's smart enough to handle these tasks efficiently. What do you think?




Thomas Trang

Facilities Maintenance Technician at American University | Contributing Editor at AutomatedBuildings.com | Revolutionizing Facility Management with AI

5mo

Ben Bartling, it's great that the integration of ChatGPT's image analysis and Python's data processing capabilities aligns perfectly with our operational goal of continuous learning and improvement in facility management. This demonstrates the potential benefits of these technologies for our operations, and I'm excited about the improvements they can bring to our facility management practices.

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics