Blocking == Technical Debt

Blocking == Technical Debt

Blocking occurs every time a program waits in line for something to happen. For instance, the basic Arduino "Blink" example turns the LED on and calls the delay() function to wait for a timeout event in 1000 milliseconds. Then it turns the LED off and calls delay() to wait in line for another timeout event in 1000 milliseconds. Performed in a loop, this ends up blinking the LED.

Blocking in Arduino programming is accomplished by busy waiting. But it is not the only form. Blocking based on context-switching is the cornerstone of every RTOS (Real Time Operating System). For example, a call to the FreeRTOS vTaskDelay() blocking function allows other RTOS threads to execute while the original thread is delayed. A single thread can make multiple calls to vTaskDelay() or any other blocking RTOS primitive (e.g., xSemaphoreTake()). Still, the RTOS blocking mechanism "remembers" precisely where in the arbitrarily complex thread's code sequence to return to after unblocking. The RTOS charges a hefty price for this capability in the form of a whole private stack for each thread and an elaborate context switch, but the convenience is considered well worth it. In fact, the intuitive sequential programming paradigm enabled by blocking is the main argument for using a traditional RTOS in the first place.

Video "To block or not to block, that is the question!"

Blocking as Technical Debt

Sequential programming based on blocking is simple and intuitive, but every blocking call incurs some technical debt that borrows initial expediency in exchange for increased development costs later.

The problem with the sequential paradigm is that it hard-codes the blocking calls, which means that the code can handle only the hard-coded sequence of events. This might work initially, but you inevitably discover other events and event sequences that must also be managed during ongoing development. Weaving the new events into the existing hard-coded blocking call structure becomes progressively difficult. The problem is not just with sequencing but also timing because the blocking calls clog the control flow.

An RTOS makes the sequential paradigm more extensible because RTOS allows you to create multiple "super-loops" (called threads or tasks). Additional threads can block and wait in line for additional hard-coded events independently from the existing threads, thus relieving the timing and, to some degree, the sequencing problem. However, the added threads often need to share resources with the existing threads, which require protection to prevent race conditions and other concurrency hazards. RTOS provides mutual exclusion mechanisms (e.g., mutex). However, such mechanisms are often also based on blocking, so they only exacerbate the blocking problem. In this sense, RTOS doubles down on blocking. The following diagram shows the perils of blocking.

Restricting Blocking as the Best Practice

The perils of blocking are widely recognized even without an RTOS. For instance, the basic Arduino "Blink" tutorial contains a recommendation for the "Blink Without Delay" example based on the non-blocking Arduino millis() function.

However, the constipation problem caused by blocking is much more acute in an RTOS. Here, the concurrency experts recommend drastically restricting blocking by structuring threads as event loops with a single blocking call to a message queue at the top and non-blocking code after that [1,2]. I presented this event-driven approach in the following YouTube video:

Video: The "Active Object" design pattern

RTOS as Technical Debt

Even though a traditional blocking RTOS can be used to implement event loops for otherwise non-blocking, asynchronous Active Objects, it is an awkward fit.

Traditional blocking RTOS vs. event-driven, non-blocking Real-Time Embedded Framework

As shown in the Venn diagram above, a traditional RTOS does not provide the mechanisms necessary for extensible, event-driven Active Objects, which must be supplied externally to the RTOS. At the same time, most RTOS services are based on blocking and, therefore, are useless and outright counterproductive for the event-driven paradigm. In that sense, the entire traditional RTOS can be considered a source of technical debt.

Blocking vs. Preemption

Embedded developers often conflate the RTOS's ability to manage blocking threads with preemptive multitasking. However, these concepts are independent. Preemptive but non-blocking multitasking has been known for decades and is used extensively, for example, in the automotive OSEK/VDX kernel [3]. The immensely popular ARM Cortex-M also implements preemptive, non-blocking multitasking in the NVIC (Nested Vectored Interrupt Controller). The NVIC allows preemption of prioritized interrupts, which all nest on a single stack (the main stack). The NVIC is an example of the Stack Resource Policy (SRP) implemented directly in hardware [4]. Please see my "Super-Simple Tasker" videos from the Embedded Online Conference:

Video: Super-Simple Tasker -- The Hardware RTOS for ARM Cortex-M, Part-1

Video: Super-Simple Tasker -- The Hardware RTOS for ARM Cortex-M, Part-2

The advantage of preemptive, non-blocking kernels is that they are adequate for non-blocking Active Objects, while being much more efficient than blocking kernels, yet still fully compatible with the Rate-Monotonic Scheduling/Analysis (RMS/RMA) method.

End Notes

Many embedded developers believe that the venerable "superloop" and traditional blocking RTOS are the only alternatives for embedded software architecture. However, other paradigms and efficient implementations exist, such as the QP real-time embedded frameworks [5,6].

The sequential paradigm enabled by blocking might be intuitive and expedient, but the question is how well it matches reality. Most real-life embedded systems must handle multiple sequences of events, so they are poorly served by software with hard-coded sequences. Therefore, choosing a more flexible, reusable, non-blocking paradigm often pays off, even if it means abandoning the sequential model. Remember: adding blocking to a non-blocking code is easy. Removing blocking calls sprinkled throughout the code is like trying to "unscramble" scrambled eggs.

[1] David Cummings, "Managing Concurrency in Complex Embedded Systems," Workshop on Cyber-Physical Systems, 2010

[2] Herb Sutter, "Prefer Using Active Objects Instead of Naked Threads," Dr. Dobbs Journal, 2010

[3] OSEK/VDX, "OSEK/VDX Operating System Version 2.2.3," OSEK/VDX website, 2005

[4] T. P. Baker, A Stack-Based Resource Allocation Policy for Realtime Processes, IEEE, 2002

[5] Quantum Leaps, QP/C Real-Time Embedded Framework, GitHub

[6] Quantum Leaps, QP/C++ Real-Time Embedded Framework, GitHub

Sojan James

Advanced Technology - Acsia Technologies

1mo

Totally agree! Im a big fan of your book Practical Statecharts in C/C++ and this pattern has helped me a lot. The Rust crate 'statig' implements these concepts in Rust and is my go-to crate for state machines.

Like
Reply
Bradley Clonan

Software Architect | Neuroscience, Psychology, Criminal Justice/Police Science, Keyboard artist, go getter

1mo

You can use a combination of blocking procedures / state machines (deterministic) and the concept of a eventual consistency through Liniage state checking to set minimum kpis and efficiently upgrade/measure software. Even face to face communication is not real time. One example would be creating a “login” system that has states defined for “failed/passed”. Setting a timeout of 500ms to check for a client side change in state (eg server responded) or did not allows you to set a threshold and goal while also forcing the client to transition to the next deterministic state in your system (given no response from server / no change in state etc) transition to failed login. That way even if your systems go down the client knows what to do next. This isn’t technical debt it’s defensive programming in a deterministic “mouse maze”. Logically all client side software has an end product and is a combination of any data sent otw or otherwise and a truthy/falsy statement (n).

Like
Reply

Thanks! Always a pleasure to be enlightened :)

Like
Reply
Sreten Jovicic

Embedded Software Engineer

1mo

"...adding blocking to a non-blocking code is easy. Removing blocking calls sprinkled throughout the code is like trying to "unscramble" scrambled eggs." This is so true...! 😅

islam nooh

Principal Embedded Software Engineer at Valeo

1mo

When will we have a new episode of the course "modern embedded systems". Long time no see 🥲

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics