Top 10 most disastrous software bugs
Introduction
A bug is a fault or flaw in software that causes it to behave unexpectedly or produce incorrect results. A bug alters the behavior of the software, and the software does not function as expected. Bugs can be functional, behavioral, or cosmetic. Any deviation from the user story, any glitch in the UI, or if the software does not function as intended by the developer, then it is termed a bug.
A bug can cause malfunction. A bug could occur due to multiple reasons like coding errors, inadequate understanding of the requirement, miscommunication between teams, misunderstanding the requirements and designs, the complexity of code, design, architecture, and changes in the environment, etc. They are generally found during the testing phase.
Failing to address bugs early in the development process can lead to significantly increased costs. As bugs go unnoticed, they can affect more parts of the code, making them harder and more time-consuming to fix later on. This not only increases development costs but also disrupts the workflow, reducing overall productivity. Moreover, bugs that make it to the final product can result in customer dissatisfaction, negative reviews, and loss of trust... that in most normal cases. What if two additional factors were added? Billions of dollars, and human lives at stake. A real recipe for disaster, isn't it? Here are the Top 10 most disastrous software bugs.
#10 - AT&T long-distance network crash [Telecommunications]
The popular carrier AT&T, although a successful company now, could have lost it all on January 15th, 1990. They opted to upgrade to more complex software than their current one in hopes of improving long-distance calling. However, this decision backfired spectacularly. What transpired over the next nine hours was a businessman’s nightmare: 75 million missed phone calls, an estimated 200,000 airline reservations canceled, and a loss of $60 million in long-distance charges.
All this chaos was because the new software included instructions to push messages to the switching centers much faster than normal. The switching centers handled the calls and carried them out, but when they received calls faster than they were prepared for, they sent a signal to the next switch, which began to reset itself. Like dominoes, that switch sent a signal to the next one, causing it to reset as well, and so on. Eventually, every switch was caught in a loop of constant resetting. With no switches available to handle incoming calls, AT&T’s services were left completely unavailable. Surprisingly, the company was not as hindered as you might think and still stands strong today.
#9 - NASA's Mariner 1 [Space]
The year is 1962. The news has been hyping up the Mariner 1 launch as an epic feat in the space age. You sit on the edge of your seat, gritting your teeth as the unmanned spacecraft is seconds away from takeoff. Then, liftoff. For a whole four minutes, you are filled with glee as you witness the extent of human knowledge. And then—BOOM!—the screen is flooded with a ravaged piece of metal in the sky and the glaring flare of failure.
Don’t worry, though; there were no casualties, as the craft was unmanned—just $135 million down the drain. But how did this happen, you ask? Well, the order to self-destruct was issued by a range safety officer only six seconds before separation because the craft had not responded correctly to certain commands. Contrary to popular belief, the mistake that triggered the catastrophic result was due to a missing overbar in a mathematical equation, not a missing hyphen. This confusion likely stemmed from the misleading title of an article labeled “The Most Expensive Hyphen in History”. However, the story doesn't end tragically. Five weeks later, the Mariner 2 was launched and successfully completed the original task of flying by Venus.
#8 - NASA's Mars Climate Orbiter [Space]
Sent to space on December 11th, 1998, the Mars Climate Orbiter was meant to study the Martian atmosphere and climate temperatures. The probe remained operational until September 23rd, 1999, when it was scheduled for a stable entry into Mars's orbit. However, communication was suddenly cut off 49 seconds before the expected entry, and contact was never regained. The probe, which cost $327 million to build, was lost forever. But what force was great enough to waste that much money? The answer lies in a simple line of code. The faulty line was responsible for calculating a safe position for the probe to enter the atmosphere. However, it produced results in United States customary units instead of metric units. The use of these two different types of measurements conflicted with each other, causing the probe to either crash or spin out of control. This setback didn’t stop us, though, because, as I’m sure you know, we recently discovered flowing water on Mars's surface.
#7 - Knight Capital Group [Finance]
It was an average day at the stock market for Knight Capital on August 1st, 2012. Knight had just released untested software into production, which contained an obsolete function. When a technician forgot to copy the new code to one of the eight computer servers, it triggered a repurpose flag that activated a function known as Power Peg. This caused stocks to move up and down in an attempt to verify the trading algorithm's pattern, which in turn drastically lowered the stock prices of nearly 150 companies, bringing stocks valued at $20 down to $5 or $8. As you can imagine, this sent Knight's stock into a downward spiral, first plunging 33%, and by the next day, Knight had lost 75% of their equity value. In the end, Knight lost a total of $440 million. Amazingly, this did not drive them to bankruptcy, but in 2013, they were bought by Getco LLC.
#6 - Intel Pentium FDIV [Hardware]
Many computers run on Intel technology, and Intel is especially known for its excellent line of processors. However, in 1994, their reputation was put on the line when a mathematics professor, Thomas Nicely, discovered a bug that affected the Floating Point Unit of the original Pentium processors. The bug in question would cause numbers to return incorrect results past the third decimal point. This bug wasn't present in every processor, but it wasn't limited to just a few either. As you can imagine, this caused significant problems for people using the processors in fields like math and science. Many angry customers demanded replacements, and Intel agreed—on the condition that they had to prove their computer had the bug. In the end, it cost Intel $475 million in total. However, this did not affect them in the long run, as Intel still stands strong today as one of the leading providers of PC parts.
#5 - AECL's Therac-25 [Medical]
Made by Atomic Energy of Canada Limited, or AECL for short, the Therac-25 was a machine designed for radiation therapy. The design flaw of this machine was so severe that it’s hard to believe nobody foresaw it beforehand. To start, there were only two modes of radiation therapy: direct electron beam, which consisted of a low dose of five electron volts, and a mega-volt X-ray mode, which, along with sounding like a villain's secret weapon, delivered an X-ray beam mixed with a 25-electron-volt beam to the target. As you can tell, there is a drastic difference in radiation levels. The fatal bug in the software caused a mega-volt dose to be released even when an electron beam was chosen. This bug led to the deaths of six patients. However, keep in mind that this happened in 1985, so there is nothing to fear now.
Recommended by LinkedIn
#4 - Panama National Institute of Oncology's Cobalt-60 [Medical]
In the previous case, I said there is nothing to fear now, and that may be true, but in 2000, 15 years after the Therac-25 incident, a Cobalt-60 machine at Panama's National Institute of Oncology also experienced difficulties. The machine was very expensive, with the bill reaching upwards of $110,000. Because of this, the Cobalt-60 machine became outdated and was barely used. However, when it was used for radiation therapy, it caused 24 deaths by overdose. The program worked by drawing digital blocks that symbolized the part of the body that needed to be radiated. The amount of radiation to be applied would be calculated from there. But because the software only allowed four blocks to be drawn, if the doctors needed a fifth block, it wouldn’t be allowed. So, the doctors decided to draw all five blocks under the same tag, which made the machine think it was only one block. This led to a miscalculation of the radiation needed, resulting in patients receiving far more than intended. While this is more of a human error than a software error, the staff claimed that the instructions provided absolutely no guidance on how to draw the blocks. A patch was eventually issued by Multi-Data to address the situation, treating it as a very, very dangerous software bug.
#3 - USA's MIM-104 Patriot [Military]
So far, we've discussed mistakes in software for companies and space exploration, but when a mistake is made in a military operation, it risks the lives of many. Unfortunately, this was the case on February 22nd, 1991, when an MIM-104 Patriot failed to detonate an incoming Iraqi missile, leading to the deaths of 28 soldiers and injuries to 98 others. The Patriot is an anti-air missile system that works by launching an interceptor missile at an incoming missile so that it detonates before reaching the troops. The Patriot had a strong track record up until this incident, with a success rate of 95%. However, after this and a few more failures, its success rate was scaled back to 79%.
This sparked an investigation that led to the discovery of a software error that affected Patriot's use of timestamps. Apparently, the Patriot's battery had been in use for 100 hours, and because of this, the clock had shifted by one-third of a second, which in turn caused a problem in converting radar pulses to floating points. As a result, the Patriot would look in the wrong part of the sky and attempt an interception, but it ultimately led to a misdistance of 600 meters. However, as the investigation uncovered these issues, they were also fixed, allowing the troops to rest easy knowing that they have a Patriot protecting them.
#2 - EDS Child Support [Administration]
Many people rely on child support to pay the bills, so when EDS (Electronic Data Systems) created an advanced and overly complicated payment program for the UK Child Support Agency, there was no room for error. However, at the time of implementation in 2004, the DWP (Department for Work and Pensions) decided to reconstruct the entire agency, resulting in a conflict between the two systems as they were incompatible. This set off a particularly unfortunate series of events, leading to the overpayment of 1.9 million people, underpayment of 700,000 people, the accumulation of 7 billion dollars in uncollected child support payments, and a delay in 239,000 cases, ultimately costing UK taxpayers $1 billion dollars to date. It doesn't end there, either, as these issues led to the resignation of the head of the Child Support Agency, Doug Smith. This wasn’t the first time EDS had gotten into trouble, but being bought out by HP probably did them a favor.
#1 - ESA's Ariane 5 [Space]
The Ariane 5 was a rocket that took the European Space Agency 10 years and $7 billion to create. The rocket was intended to hurl three-ton satellites into orbit with each launch, which, if accomplished, would have given Europe supremacy in the space business. However, it only took 40 seconds for the rocket to explode, creating one of the most time-consuming ways to waste $7 billion.
Perhaps even more comical is the bug that triggered the explosion—the culprit being a line of code that tried to fit a 64-bit number into a 16-bit space. As far as we know, the rest of the code was perfect, but this simple memory allocation error laid waste to $7 billion and 10 years of effort. According to James Gleick, the programmer's thought process while writing this was that the number would never exceed a 64-bit limit. However, the Ariane 5 was much faster than the previous Ariane 4, which was what they had based their numbers on. Fortunately, the craft was unmanned, and no one was hurt in the process.
Special mention: World’s First Computer Bug (1947)
In 1947, a team of computer scientists and engineers at Harvard University in Cambridge, Massachusetts, found that their computer, the Mark II, was delivering consistent errors. When they opened the computer’s hardware, they found ... a moth. The trapped insect had disrupted the electronics of the computer.
The moth was removed and taped into the computer's logbook with a note that read, "First actual case of bug being found." Although this "bug" wasn't a software error but a physical issue, yet it has become an iconic part of computing history and is the origin of the term we now use to describe flaws or errors in programs and systems. The term "bug" was already used by engineers to describe technical glitches, this incident became famous and popularized the use of the term "bug" to refer to issues in computing.