Architecture for Capable Software and Tight Control
Conway's law meets process automation

Architecture for Capable Software and Tight Control

  • Plant and business functions are carried out by specialized people and automation devices independently but not in isolation, and on very different timescales.
  • Specialized app tools and automation devices, execute and communicate with guaranteed response time appropriate for each function.
  • The recommendation is to deploy apps which are modular as per tasks, on servers and automation devices which are networked, grouped as per real-time requirements.
  • The result is powerful industrial software tools, real-time control performance, easy replacement, and upgrades.

Imagine apps with the specialized features you need for each of your duties be it predicting valve or pump failure, picking the best time to clean a heat exchanger or replace a corroding pipe section, detect and pinpoint overconsumption, or predicting process upsets or batch success. Imagine not having to manually transfer data between these systems to make reports, and even getting real-time dashboards. And imagine being able to replace any one of those systems without having to mess with the rest. And all of it responding fast enough for good control and a pleasant user experience. What is the secret? Here are my personal thoughts:

Functional Modules Mirror Departmental Structure

System functionality and communications mimic organizational structure. This is known as Conway’s law. This is the reason why systems are modular.

Organizational Structure

Companies are organized in departments for various functions, each with its roles and responsibilities. Organizations are structured in functional departments for accountability, to be able to effectively manage work. The office has HR, finance, and sales etc. departments. The plant has reliability, maintenance, sustainability, quality, and production etc. departments.

Departments have people in roles specialized in their domain to carry out these tasks. Reliability work includes predicting valve or pump failure. Sustainability work includes picking the best time to clean a heat exchanger and detect and pinpoint overconsumption. Integrity work includes picking the best time to replace a corroding pipe section. Production work includes predicting process upsets and batch success.

People in one department communicate with people in some of the other departments to carry out their tasks, exchanging documents in hardcopy or softcopy form. When reliability engineers predict valve or pump failure, they will request maintenance engineers to act before the failure materializes. When sustainability engineers find a heat exchanger should be cleaned or detect overconsumption or leaks, they will communicate with maintenance engineers to fix it. When integrity engineers find pipes are corroding too quickly, they will communicate with process engineers to adjust the crude mix, inject more corrosion inhibitor, or other changes. If integrity engineers predict pipe failure, they will communicate with maintenance engineers to replace it and possibly reduce the relief valve setting if the pipe section cannot be replaced immediately. One department can talk to the other but there are walls for a reason so they have to badge-in through the door.

The department-to-department communication structure of a company mirrors its organizational structure.

The communication structure of an organization will resemble its administrative structure. - Melvin E. Conway

Domain Boundaries

The people in each department require its new tools to carry out their function more effectively. At the very beginning of the design of a system the people in each department that will eventually use each new tool in the overall system will be consulted with regards to their tool requirements to meet their responsibilities. Reliability engineers need tools for predicting valve or pump failure. Sustainability engineers need tools to monitor the performance of equipment like heat exchangers and detect and pinpoint overconsumption. Integrity engineers need tools for monitoring corrosion rate and pipe wall-thinning. Production operators need tools for predicting upsets in continuous processes or batch processes off their ideal trajectory.

The people in each department will define real-world use-case requirements that will help in their specific tasks. This is done within the boundaries of their domain and their use-cases. Reliability engineers will define requirements for valve analytics, pump analytics, and vibration analytics. Sustainability engineers will define requirements for heat exchanger performance analytics and energy management. Integrity engineers will define requirements for corrosion analytics. Production and process engineers will define requirements for batch analytics.

Conceptual Design

As each department define the requirements for their tools, multiple specialized tools will be required. Apps end up modular mirroring the departmental structure. Modular systems are easy to understand.

Any system of consequence is structured from smaller subsystems which are interconnected. - Melvin E. Conway

 Another requirement is that each tool must communicate with other tools in the same way people used to talk to others when the task was manual. Valve analytics and pump analytics apps must communicate with the computerized maintenance management system (CMMS) app. The heat exchanger analytics app and energy management app must also communicate with the CMMS. The corrosion app must communicate with a portal app to notify the process engineers, the operator workstation apps, and with the CMMS. The communication structure of the tools (software interfaces and protocols) ends up mirroring the departmental structure.

Detail Design

The people in each department will select a tool with specialized functionality that meets their needs. Reliability engineers will pick an app that meets their requirements for valve analytics, and app for pump analytics, and an app for vibration analytics. Sustainability engineers will pick an app for heat exchanger performance analytics and an app for energy management. Integrity engineers will pick an app for corrosion analytics. Production and process engineers will pick an app for process analytics and an app that meets their requirements for batch analytics. Apps end up modular mirroring the departmental structure. Again, these apps communicate with each other and devices. Many or all of these apps can run in the same app framework.

There is a very close relationship between the structure of a system and the structure of the organization which designed it. - Melvin E. Conway

 Big Picture Coordination

If each department used a different wireless sensor network technology for their sensors the plant would end up having multiple wireless sensor network infrastructures that would have to be supported long-term. Imagine the mess if the reliability department considered LoRaWAN for vibration sensors, and the sustainability department considered ISA100.11a for multi-input temperature transmitters and acoustic noise sensors. And what if at the same time the integrity department considered ZigBee corrosion sensors and the quality department considered WirelessHART adapters for HART devices. And then the production department some proprietary protocol for radar level transmitters. It would be a mess.

Likewise, if each department used a different Ethernet protocol for their devices the plant would end up with a huge number of gateway servers that would have to be configured and supported long-term. Again imagine the mess if the reliability department considered Modbus/TCP vibration monitoring systems, and the sustainability department considered PROFINET power meters. And what if at the same time the quality department considered HART-IP for wireless gateways, and the production department EtherNet/IP for a package unit PLC. It would be a mess.

Similarly, if each department used a different software interface technology for their apps the plant would end up with a huge number of gateway servers that would have to be configured and supported long-term. Imagine the mess if the reliability department considered generic MQTT for condition analytics apps, and the sustainability department considered sparkplug MQTT for performance analytics. And what if at the same time the integrity department considered DDS for corrosion analytics, and the quality department considered Kafka for the LIMS. And then production department considered OPC-UA for process analytics apps. It would be a mess.

This is where the plant I&C team coordinates with all the plant departments and multiple device and app vendors to use the same technologies: IEC62591 (WirelessHART) as the wireless sensor network, HART-IP for devices like IO and gateways, and IEC62541 (OPC-UA) for the software interfaces and devices like controllers. These standards are supported by multiple vendors so the full spectrum of devices and apps are available.

Unified Architecture

The recommendation is for the I&C team to drive the use of common technologies, namely WirelessHART, Ethernet, Wi-Fi, HART-IP, and OPC-UA thus enabling a consolidation of the many departments’ subsystem into a single unified architecture. Such systems are more powerful, simpler, and much easier to support than systems using a mishmash of technologies.

Hierarchical Levels Mirror Real-Time Requirements

Functions and tasks in the plant happen on radically different timeframes. For business admin functions like pay slips, quotations, and invoices, same-day response is sufficient. For plant management functions like equipment condition, energy consumption, lab sample quality test results, by the minute is sufficient. For process operation measured values, alarms, events, and to initiate control actions, by the second is required. For closed loop control and safety instrumented functions hundreds of milliseconds is required. Measurement and actuation must happen within tens of milliseconds. That is, there are seven orders of magnitude difference in the response time requirements between business admin functions, measurement, and everything in between. Running all these functions on the same computer or device would be a challenge.

Because of the huge timescale difference it is convenient to organize functions in a few groups structured according to their response time requirements so they can share computing resources and network infrastructure accordingly. These functions are also more conveniently located in specific places. HR, finance, and sales functions in the ERP in the main office. Reliability, maintenance, sustainability, and quality functions in the various operations management automation systems in the plant admin building. The production functions in the DCS workstations in the central control room (CCR), control and safety functions in the controller in the marshalling room. Measure and actuate functions in the instruments in the field on the process. There is still one or more orders of magnitude difference in the response time requirements between each of these functional groups.

Seven orders of magnitude difference in response time requirements

So it is easy to see how these logical function groups became first the Purdue model, then ISA95, and then IEC62264. Some functional groups are humans at computers, other groups are ‘machines’ like edge devices and embedded controllers. Some of the humans work on computers which are user interfaces for the ‘machine’ systems, a human-machine interface.

System architecture mimic departments and human communicaiton

The reason why systems are structured this way is not “historical”, it is real-time requirements and human nature. Hyper-converged infrastructure (HCI) with software-defined infrastructure including virtualized computing and software-defined networking will have no physical cables and no physical switches. It will all be software. Yet they will still have virtual machines (VM) and virtual network levels corresponding to the timeframe requirements of the various functions. But it will look neater and be easier to manage because it is software instead of hardware. Physical L2, L3, and L3.5 networks and physical L2 and L3 computers become one physical machine, running and connecting compute function ‘workloads’, there will still be software defined networks structured as L2, L3, and L3.5 inside the physical box.

For as long as companies are organized in departments, that carry out functions on different timeframes, systems will have modular functions in functional levels, although some network levels may be virtual in a shared physical machine

In simple broad terms L0 and L1 sit in the field, L2 and L3 are the on-premises edge, while L4 and ‘L5’ is in the cloud.

A previous essay explained three other reasons why automation systems have levels: cybersecurity, flexibility to make changes, and clear line of responsibility. The most interesting fact is that cybersecurity is not the original reason why the Purdue model was created. The Purdue model was created in the 90s, before cybersecurity became a central topic. However, since segmenting systems into security zones with managed conduits between them is key to achieving security, cybersecurity standards like ISA99/IEC62443 and ICS-CERT were created around the ISA95/IEC62264/Purdue model which is a natural fit for security zones.

Break the Data Silos

Plant equipment like boilers is often supplied as package units from specialized vendors. This is unlikely to change. So levels will be levels. Package units will be package units. The key is that apps and devices can communicate with each other across functional level boundaries and package unit boundaries. Boundless automation.

Integration Across Package Units

Boilers, compressors, water treatment, industrial gas production, and metering etc. are often supplied as package units, modules, or skids each with their own self-contained control system such as a PLC and local HMI panel. Worst case the package unit has no integration with the main DCS, it is completely ‘siloed’, or only a common hardwired alarm signal, possibly some hardwired process variables – still a ‘data silo’ or isolated ‘island of automation’. We need standard interfaces. A more advanced package unit allows DCS integration through Modbus where individual alarms and more process variables can be provided to operators at the main DCS workstations to better understand the state of the package unit and its process. However, with Modbus there is a fair bit of configuration work. The recommendation is to instead use standard OPC-UA interfaces for package unit integration to the DCS as this makes the integration easier, such that it becomes practical to integrate the full information from the package unit. OPC-UA breaks data silos. The recommendation is to use a DCS that supports the Module Type Package (MTP) file standard, and package units which also supports the MTP standard, which together further simplifies the integration of full package unit information into the main DCS.

Most package units will have instrumentation using 4-20 mA/HART. So over and above the data associated with the core process control by the PLC, it is also useful to access the intelligence in smart sensors and smart valves, or rather the intelligence in smart transmitters and positioners. That is, access data such as valve analytics, flow meter calibration verification, and analyzer condition, as well as configuration management. The recommendation to enable this is for the package unit interface to also support HART-IP for intelligent device management.

Integration Across Functional Levels

Purdue/ISA95/IEC62264 functional levels does not mean that a function at one level cannot talk to a function at another level. It does not mean an L3 function cannot talk to an L4 or L2 function. It does not mean a device or app in one security zone cannot talk to a device or app in another security zone. They can. But it is done in a structured manner through managed conduits. Not in a ‘spaghetti architecture’ where everything is connected to everything else. Some data is percolating up and some data is trickling down, but most data is communicated among apps and devices on the same timeframe within the level itself. Your I&C engineers know this well. Again, for integration between apps and devices to be practical we need standard software interfaces. The recommendation is to use apps and devices which support standard OPC-UA. Again, OPC-UA makes it easy to access the full information in apps and devices, even across functional levels. For instance, an L3 data lake, historian, or analytics app can get data directly from a L0 wireless sensor using OPC-UA through a L1 wireless gateway. Note that is this case you are ‘skipping’ L2 going around the DCS servers. Remote autonomous operation of normally not manned offshore platforms is becoming increasingly common. For this use-case OPC-UA can be used to access data from a local controller on the remote platform from a center of operations anywhere else in the world.

OPC-UA is commonly available in higher-level devices like PLCs, controllers, and gateways, but not in smaller devices like field instruments. Field instruments like sensors and valves use 4-20 mA/HART or WirelessHART – which are not supported in data lakes, historians, or analytics apps etc. Therefore the recommendation is to use wireless gateways which automatically convert wireless sensor measurement data to OPC-UA and HART-IP.

Data Direct to Cloud or Office

Similarly the Purdue/ISA95/IEC62264 functional levels does not mean you cannot send data straight to the cloud. It can. Data can be sent straight from the L2 DCS to the cloud, from an L1 sensor gateway to the cloud, or even an L0 sensor direct to cloud. You are simply skipping some levels, the levels which are not used. But it would not be right to say you ‘flatten’ or ‘collapse’ levels because the functions of the levels you are skipping are not there, and the rest of the plant automation still has all its functional levels. Likewise, some people would like to have DCS or sensor data direct to their office laptop on the L4 network. Sometimes we see the cloud data center referred to as L5 which is not quite correct so I just say cloud.

DCS to Cloud or Office

Data can be sent from the L2 DCS ‘directly’ to the cloud to be used in analytics, reporting, and dashboards etc. However, care must be taken when connecting to the cloud provider across the Internet such that the Internet connection doesn’t become a weak point for a cybersecurity attack. Similarly, it is important the robustness of the DCS is not jeopardized by external systems making changes to the DCS. The DCS must be protected from malicious external attacks as well as from inadvertent mistakes by internal users. For this reason, the DCS is connected to the cloud or office L4 network through a unidirectional data diode which allows data to stream from the DCS to the users, but not back into the DCS. That is, the data is streaming from the DCS, through a data diode, to an edge server where a complete copy of the DCS data is replicated and maintained up to date in real-time. All external work is done on this copy of the data without affecting the DCS. Client applications such as data lakes, analytics, and reporting and dashboard tools like Excel and PowerBI access the data from this edge server. The recommendation is for the edge server to support standard OPC-UA interfaces to enable access to the data since industrial software like data lakes and industrial analytics use OPC-UA. OPC-UA makes roll-out of digitalization fast as it eliminates the need for custom coding to proprietary API. OPC-UA is supported on all leading cloud platforms.

Sensor to Cloud or Office

Data can be sent from the L1 wireless sensor network gateways ‘directly’ to the cloud to be used in analytics, reporting, and dashboards etc. This is often referred to as Industrial Internet of Things (IIoT) or Monitoring & Optimization (M+O). If the gateway and underlying devices are not connected to the DCS, this becomes a straightforward solution since the Internet connection is not an attack path into the DCS. Client applications such as data lakes and analytics access the data from the wireless gateway. The recommendation is for the wireless gateway to support standard OPC-UA interfaces to enable access to the data since industrial software like data lakes and industrial analytics use OPC-UA. OPC-UA makes roll-out of IIoT fast as it eliminates the need for custom coding to proprietary API. OPC-UA is supported on all leading cloud platforms.

Data can even be sent from L0 sensors using the cellular/mobile network directly to the cloud to be used in analytics, reporting, and dashboards etc. This mostly makes sense for level sensors on moving containers such as IBC tote tanks and ISO tanks which are moved between locations for consumption and refilling.

Improving Data Flow

Its exciting to learn about new ideas, changes, doing things differently. It allows us to imagine it will solve our problems. So whenever a new approach is proposed it gets publicized a lot because it garners attention. However, not all new ideas are better. They may be overlooking why things have been done a certain way for decades. An approach that works well in one domain main not work well in another. The underlying reasons for the Purdue model levels are still valid so the model still makes sense and lives on. Technology changes but levels remain the same. The levels live on virtually inside hyper-converged infrastructure. OPC-UA is used instead of OPC-DA, and HART-IP is used instead of Modbus and so on.

So if your objective is to improve data flow among devices and apps in the plant, my recommendation is to change the technology. Data flow is typically hampered by proprietary network protocols and proprietary software API. Proprietary protocols and ‘exposed’ API require custom coding which is problematic. The recommendation is to use standard protocols and interfaces like HART-IP and OPC-UA. This will improve data flow in the plant. If you want to simplify the architecture, you can collapse/flatten physical L2, L3, and L3.5 servers into a single physical hyper-converged infrastructure where network levels instead are software-defined.

If you use the wrong protocol and software interfaces, data will still not flow regardless of your architecture.

And you don’t need a monolithic system to enable data flow either. You can use third-party historian and analytics on your DCS provided the DCS, historian, and analytics supports OPC-UA. Thanks to OPC-UA, the various user groups in the plant can pick the right software tool for them, and it can exchange data with other apps and devices.

That is, the recommendation is to have OPC-UA in the specs for all industrial software exchanging data with others, as well as in the specs for controllers and gateways. Wireless gateways shall also support HART-IP for intelligent device management.

Well, that’s my personal opinion. If you are interested in digital transformation in the process industries click “Follow” by my photo to not miss future updates. Click “Like” if you found this useful to you and to make sure you keep receiving updates in your feed and “Share” it with others if you think it would be useful to them. Save the link in case you need to refer in the future.

Peter van Dam

Manager Smart Operations bij Royal Cosun / Cosun Beet Company

1y

Hi Jonas, Thx again for your "food for thought"

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics