Embracing Failure
As software engineers we can’t fear failure - we have to embrace it. Fear is a paralyzing emotion. It shuts down the creative parts of the brain. Fight or flight is triggered. Software engineering is a creative discipline. We create something from nothing. That, in my book, is the very definition of creativity. To be a great software engineer is to be free from fear; to see solutions and not just problems. Sure, there will be problems, but all problems are figure-out-able!
Failure is not just a possibility - it’s a guarantee. The second law of thermodynamics says so. Everything in the universe moves from an ordered state to a disordered state. They call it entropy. A physicist might say that the time arrow and the entropic arrow move in the same direction. This is the reason why a house left unattended will fall quickly into disrepair. A drinking glass can topple from a table and break into one hundred shards of glass but we will never see one hundred shards of glass come together to form an ordered drinking glass! This is entropy! What’s that got to do with software engineering? Power supplies fail, memory chipsets fail, and network switches fail. Everything eventually fails. Our software might be hosted in the cloud. It might be dockerized running on a pod in a kubernetes cluster which runs on virtual machines but those virtual machines run on a server sitting in a rack in a large data center with power, cooling and networking. If any element fails, our software fails. Even if the hardware doesn’t fail, the software can fail. All software has associated state or data - typically stored in some database or datastore. As data is added to the database, a certain amount of disorder (entropy) is added to the system. Queries might slow over time. The client making the query might then slow. This slowing down can bubble up the chain and eventually this might become a problem. It can lead to time-outs or eventual failure. Once the issue is spotted we address it. It might involve improving the query or adding indexes to tables. Addressing the issue involves applying energy to the system to counter the forces of entropy. As engineers we must design our solutions to handle failure. And when failures occur we must apply energy to resolve that failure. It’s unavoidable. It’s the laws of physics at work.
Engineers are innovators. James Dyson is one of the more famous modern day engineering innovators. His story about how he came up with the bagless cyclone vacuum cleaner is very interesting. It’s an example of recombinant innovation - taking two seemingly unrelated areas, joining them together and creating something completely new. He took the cyclone extractor units used in saw mills and the problem he saw with dust bag vacuum cleaners, joined the two and invented the bagless cyclone vacuum cleaner. James Dyson can teach us all about dealing with failure. During this venture he built many prototypes. Each prototype you could say was a mini-failure. These failures did not deter Dyson. It took a staggering 5127 prototypes before Dyson had a production ready product. That’s persistence. That’s self belief. That’s dealing with failure. Failure is just another word for feedback. Everything we do in life gives us feedback. Babies learn to walk by falling hundreds of times - each fall is feedback. With each fall, a baby fine tunes the next effort. This happens over and over again until walking is mastered. We’d never learn to walk if we feared falling or if we feared failure.
As a software engineer, we rely heavily on feedback loops to help us produce high quality software. When typing in our IDE, the IDE highlights syntax errors which can be quickly addressed. The compiler or build system provides more high quality feedback. Unit and component tests provide additional levels of feedback. Logs, dashboards and alerts provide powerful feedback when our software runs in production. When we step back, the Software Delivery Life-Cycle (SDLC) is full of feedback loops. Requirements loops, design loops, test and development loops, production feedback loops. Each of these loops can feedback failure or success. Failures are detected, inputs are modified and outputs are re-measured. The process is ongoing. High quality output is generated when we have quick, high quality feedback loops at every step in the process. Slow feedback loops should be identified and made faster. Poor quality feedback loops should be modified to provide higher quality feedback. In this context, failure is just feedback. We deal with it daily. It’s not part of the process, it is the process of software engineering.
Recommended by LinkedIn
It’s safe to say that many of the solutions we work on today use a microservices architecture. Microservices have many advantages but, of course, they can also add complexity. They add the network. They add communication patterns, more API contracts and separate datastores. This is the world of distributed systems and making reliable distributed systems is known to be difficult. When I say known, I mean computer scientists in the 1980s wrote papers about this complexity. Topics such as ‘the consistency problem’ and ‘the consensus problem’ were well understood. In a microservices architecture things fail and we must design for that. Cloud providers provide services in different geographic regions. Typically each region has three zones. Each zone is typically an isolated data center with its own power, networking and cooling. Zones fail. It happens. The cloud provider will provide an SLA for a zone. Regions can also fail but it’s much more rare. As software engineers we need to understand how our software is deployed. Is it deployed in a single region or across regions? Is it deployed in a single zone or across three availability zones? How is my database deployed? What is the uptime SLA of the platform I’m running on? What’s my high availability policy? If I own a microservice, what is the uptime SLA for my microservice? Do I understand my critical dependencies? Do I understand their SLAs? What happens when they fail? Can my service recover gracefully from these external failures? It all starts with awareness. We need a certain level of awareness before we can even ask the right questions. Once we can ask the right questions we can design for all potential failure scenarios. That’s our job. That’s modern day software engineering.
When I was in school, I always associated the word ‘failure’ or ‘fail’ with something really negative. It was typically associated with exams. Nobody wants to fail or be called a failure. In truth, failure is just learning. When we fail, it’s jarring. We have to think deeply. As software engineers we are very familiar with the feeling and learning we get from things that don’t work as planned. It can result in lots of head scratching. To get to the bottom of an issue, we might have to go to more uncomfortable depths than we otherwise would have to. It can be intense. It needs super levels of focus but from that focus the real learning happens. That’s when we grow. That’s when we develop. If everything worked the first time, we’d learn very little. That’s software engineering and, admit it, that’s why we love it!
John Lynch - Senior Solutions Architect, Signify Health and Author ‘Jambot’s Guide to Technology; Written for children; secretly read by grown-ups!’’ - jambottech learning
Electronic Engineer specialising in test solution development and product development
3moExcellent article
John ... failure is just another ... feedback loop... as you say. No awhh/no hahaa
I agree John Lynch there are lessons to.learn from failure that will be useful on your success journey. But when failure becomes a pattern, then we need to seek help to break the chain ad achieve progress. Here we discuss some useful solutions that can work: www.gabrieltopman.com
Software Architect at Globalization Partners.
4moSomething going wrong is only failure if nothing is learned from it. And as learning opportunies go, it's one of the best you'll get!
Senior Technical Product Manager
4moNice. 'To get to the bottom of an issue, we might have to go to more uncomfortable depths than we otherwise would have to.' this needs an article by itself concentrating on the tensions surrounding it. Thanks for writing this