What sunken ships and out-of-bounds memory errors illustrate about incentives
This is the latest issue of my newsletter. Each week I share a different perspective. You can also Subscribe on Substack to get future issues.
This past Saturday marked the sinking of the Vasa’s 396th anniversary. Ordered in 1625 at the behest of King Gustav II Adolph the Vasa warship was Sweden’s most expensive mission critical project until that point — and most disastrous.
Grossly unstable in construction it capsized in Stockholm’s harbor minutes after setting sail in 1628.
If you’ve ever had the good fortune to visit Stockholm — as I have this past week on vacation with my wife —you’ve likely seen it1.
As you enter the museum where it is now displayed a mere 1300 meters from its salvage point in 1961 you’re visually confronted with its remarkably well preserved mass, form and ornate construction. Its a bygone advertisement of Sweden’s political and military power.
As other’s have previously noted2, the present value of the “tender ship” lies more in what it’s existence illustrates about complex systems and misaligned incentives than in it’s incredible material marvel.
Casting blame
To say our present day reality is more complex and interconnected than 17th century Sweden is a gross understatement.
But what was true then is true now.
When things go wrong and fail — especially catastrophically — we’re quick to cast blame.
During the inquest that happened after the Vasa sank, multiple surviving men were questioned — the Vice Admiral, boatswain, ship master. The testimony revealed that neither the ballast or cannons loaded were at fault3.
No: it was a design problem. And the King had approved the design and multiple modifications to it. The ship was simply too tall and narrow.
The prosecutor at the time had no choice but to blame its original designer — whom had passed away a year before. A convenience!
Ultimately, none in the official inquest were punished and the king’s name remained untarnished.
If it’s in your control, why do you do it? If it’s in someone else’s, then who are you blaming? Atoms? The gods? Stupid either way. Blame no one. Set people straight, if you can. If not, just repair the damage. And suppose you can’t do that either. Then where does blaming people get you? — VIII. 17
Fast forward.
On August 6th, 2024 the tech company CrowdStrike released it’s full root cause analysis and executive summary of the July 19th Channel File 291 update and incident that led to widespread transportation, critical care, financial systems and market roiling.
And at least $5B in economic damage. Notably Delta has made headlines claiming a tenth of that in loss. On the day of the incident $11B was wiped from CrowdStrike’s market cap.
Recommended by LinkedIn
So who is to blame? A July 23rd letter calls for testimony before a Congressional hearing at a to be scheduled time — probably this September. Was it Microsoft’s allowance of kernel-level access, CrowdStrike’s clear and unmistakable regression testing misses or the heterogeneous approach to disaster and service recovery employed in the customer (e.g. Delta) ecosystem?
Like other notable outages4 the immediate and eventual costs suffered are worth less than how collective behavior shifts as a result.
What we learn about incentives
The Vasa sank due to having insufficient room for enough ballast (weight in its hull) to stabilize it. It’s estimated that nearly double the 120 tons of stone would have been required to keep it from being crank.
In 1628 there were no mathematical methods for calculating the stability and stiffness of a ship — it was all based on prior ship building experience.
But many involved in the construction and “testing” of the Vasa knew there was something wrong.
A test was conducted before before its launch. Thirty men ran from port to starboard and the third time they did it listed so hard sideways they stopped to prevent it sinking.
It was launched anyway so as to not receive disfavor from the King.
In CrowdStrike’s case the simplest summary of what happened also involves a form miscommunication — a new feature was added that required 21 input parameter fields and it only received 20 from the file CrowdStrike pushed.
When the software attempted to access the 21st value of the input data array it led to an out-of-bounds memory, a system crash and the dreaded blue screen of death (“BSOD”) on ~8.5M machines.
In even simpler terms, imagine you were asked to grab the 21st book off of a shelf containing only 20 — you’d fail.
It’s not hard to surmise the parallels between these two events.
Charlie Munger famously said: “Show me the incentives and I’ll show you the outcome”.
Root causes in the case of the Vasa : Excessive schedule pressure, changing needs, lack of a plan with the death of the designer (i.e. shipwright), ignoring the obvious and possible mendacity.
And in the case of CrowdStrike: Inadequate and incomplete tests, no staged rollouts, lack of update flexibility via end user control and of course, in the case, ignoring the obvious.
But of course the real reason is that testing is valued less than shipping.
In each case complexity wasn’t aligned with driving incentives — progress demanded both explicitly and implicitly at the time. Consequences of the mantra “move fast”.
What you and I know:
And that despite our attempts, learning from history is still largely beyond us.
CTO, CPO & CIO | Visionary Tech Leader | Driving Growth & Innovation | Veteran of Cisco , ABB & Korn Ferry Digital
4moAgreed! Blaming individuals often misses the systemic issues at play. Addressing incentives and learning from history is key to long-term improvement.