System Design - Simplified
The whole System Design topic is so amorphous that reading many articles about it, can result in very little practical material. If you want to learn why System Design is so important and also get a practical process of how to do it well and efficient - bear with me.
I like starting off everything with why, and not what. Every solution, process or tool were created to solve a certain problem. Let's explore an example that shows us why System Design is so important.
Today, we want to deliver content in realtime. To do so, we need to process the data as quick as possible, and not let it "starve". There are two main approaches for processing that data - batch and stream. While lately stream processing has gained much popularity over batch processing, as we look deeper we find that it's use-case dependent and sometimes batch can serve us better, even for low latency systems. Sounds shocking? Let me demonstrate using a children song my niece likes called "The Five Little Monkeys".
In this song, there are five monkeys jumping on a bed. One falls off the bed, and his mom calls the doctor for advice. Immediately after, another monkey falls off, and again the mother calls the doctor. This goes on and on until no monkeys are left on the bed. Obviously a software engineering problem.
The mother, is extremely inefficient in I/O - she could have turned five calls (streaming the monkey falls) into one call only, if she had waited them all to fall (batching the monkey falls). Of course, there are pros and cons for each approach. Sometimes we can't afford to let our monkeys wait, and sometimes we can't even tell if we are waiting for nothing - are more monkeys even going to fall? See my post regarding full list of batch and stream processing pros and cons. Nevertheless, here, a better solution might be to wait a minute to see if more monkeys are going to fall and only then call the doctor. This is a trade-off the mother should consider.
Another example is throttling requests. If we want to throttle requests to be x amount per y period of time for each user, and we handle thousands of requests each second, we might want to gather z period of time before updating it in our DB (summing amount of requests in z period of time before accessing our DB to update). This of course means that our system will be accurate up to the level of z, but imagine the I/O efficiency difference.
Now that we have understood that most problems (even the one's that look rather simple) can lead to many decisions and choices we have to make - that will eventually effect the quality of the final product, we can move to the 'what part'.
"System Design is the process of defining the different components and how they interact with each other so that it meets the user-end requirements".
Looking at the broader picture, taking all the factors into consideration and making the best decisions for our business, is what System Design all about. Here's how I do it, process I've learned (and adjusted here and there to simplify) from experienced engineers throughout my military service and professional career.
Recommended by LinkedIn
Starting off with Specification. Before we start designing a solution, we must fully understand the problem. All the workflows, behaviors and functionalities of the final product. Result of this step should be doc where all of the above are mentioned and everything in that doc should be fully agreed by the stakeholders. It should not be technical at all. I find that stating the obvious here, can do magic. What I usually do in this step is write in order to think - first I persist my thoughts, and then index them. For those purposes I like focus writer - a clutter free writing software, but you can use whatever text editor your'e comfortable with. In many companies, this step is done with the collaboration of Product Managers. Here's a simplified flow of the specification process:
Next up, is Design Overview. This is a technical representation of the specification doc. The way I structure it:
Once the overall Design Overview is finished, we are going down to detail about each component in the Component Overview. Here, we explain about the component structure (i.e. in case of a REST API component, we will explain about the layer separation, what services BL and DAL will contain etc...). We also detail about the internal implementation of each API that the component should supply to other components (mentioned in the design overview). For example, in case of a database component overview, we can say "to allow fast fetching of contacts by name field we will create a BTree index on the name column in contacts table". One last thing is to mention third parties the component will depend on - like frameworks and other components. Each component mentioned in the design overview, whether if it's a new component, or one that is being changed, should have a component overview.
You've probably noticed we haven't drew any diagrams yet. That is because Build Diagrams step comes last in the System Design process. Boxes with names and arrows don't mean a whole lot if there's nothing behind them to explain how they actually communicate, and how they should be implemented. This step should be a recap and a visualisation of our technical solution. I use simple boxes and arrows, I don't like and don't remember tech diagrams vocabularies. I use Google drawings but draw.io and many other applications will do the trick.
That's it. The all process, hopefully simplified and practical. A question that I get often is when should we go through this all process? Every change in our system? That sounds like an over-kill. My answer here is that you use your common sense, to think wether this new feature/problem/requirement needs to go through this designing process. I would say that if a change touches two components or more, it probably needs designing. You have to remember - the smaller the problem is, the shorter the process will take. We should aspire to design every problem, even smaller ones. What now seems small, might be hard to change in the future as we gather more data from users, and as our code base grows.
On a final note, in the beginning of this article we've discussed one trade-off that is rather common in the Software Engineering world, but there are many. We probably encounter one almost every day. From availability vs consistency (heard of the CAP Theorem?), to indexing (sacrifice write speed for faster read?), Caching (faster results but our system becomes complex and often in-consistent), robust framework (which often leads to unnecessary boilerplates and bloated code) or flexible vanilla (which might result in longer development time)? Polling (stateless) vs Long Polling (stateful), and many many more.
I hope that after reading this article, you'll be able to design better systems, and make better decisions for your business :)
In this article I've took inspiration from many articles and engineers and I specifically recommend Husssein Nasser youtube channel for quality backend engineering content.
Senior Software Engineer at Cash App by Block (formerly Square), owner and senior consultant at LIDA group
2yGreat example, though in Australia they sing about 5 little joeys jumping on the bed. 5 little joeys jumping on the bed, One fell off and bumped its head Mummy called the Doctor, and the Doctor said - no more joeys jumping on the bed (Pay attention to the spelling of mum - the correct Australian way ;) )
Fast Lane Israel CEO | Mamram Alumni Association Chairman | Peter Paltchik CEO & Manager | Entrepreneur
2yThanks for sharing Matan! You had a great lecture at our last conference. #IdfTech3D