How to Work With a Large Legacy Codebase Like a Pro?

How to Work With a Large Legacy Codebase Like a Pro?

Learning to code is hard, but understanding a legacy codebase is another level of hard even for experienced software developers. So today, I will show you how to understand it like a professional.When you join a new company, you have a chance to help the company improve some of their processes tremendously with your fresh perspective. But in most cases, from my experience, it rarely happens like that t give you the best chance of helping out and being effective, it is important to know how to navigate a large, legacy codebase and work with others on it.


Code is a Human By-Product

The legacy codebase you're working on is the result of the decisions made by the business, engineering leaders, and developers in your company. And that means you have to be careful while dealing with it because it is called "legacy codebase" for those reasons. It is a common saying that "The code you wrote is not an extension of you". But the truth is we often still feel prickly whenever people talk about our code unfavorably. Or we subconsciously don't like to face the consequences of another persons' decisions in the form of a legacy codebase. That is why you need to be careful when you join a company. If you join a company that values process, you'll probably be guided by docs or colleagues with an understanding of the context of the codebase. What if you joined a company that has yet to prioritize such processes? Whether they guide you or not, here's what you'll need to do.


Be Curious, Don't Be Critical

It is part of your job to understand the legacy codebase. So being critical might make colleagues (developers and managers) think you're rebuking them. And in reality, most experienced professional developers have written legacy code so be curious, don't be critical. Be empathic. Instead of saying something like "This code is crap" or any kind of complaint, be curious. Be willing to learn the stories behind the codebase. Wait, I know, it is easier said than done – but you have to do it anyway to do well at your job so find out why they did it that way. Ask your colleagues to explain things to you. Watch them while they're working with it. Try to understand how it works.

Don't Code Yet – Use the Platform

It is tempting to rush into coding, but no – try to explore the platform first. Check everything about the platform from speed to UI/UX.Why is that important?Your job is to build for the users and you can't understand what they feel if you're not in their shoes. So, put yourself in their shoes first. Use the platform to feel what the end use also users feel.Why is it your job to build for users? Well, those who hire you want you to deliver solutions based on the plans they have at hand. But indirectly, they're doing everything for the end users. Your understanding of the users' pain points might help you in transforming their ideas into products or improve their decisions on what to build and how to build it.See, it is easy to explain things with knowledge, exposure, and experience. But time and time again has shown that it is better to check out the platform to see how it feels instead of going with your assumptions.And the experience you gather from using the platform will help you connect the codebase with the features of the platform and give you a better understanding of the codebase. That is another reason why you should use the platform before diving deep into code.

Read the Most Important Part of the Codebase

The Pareto principle (otherwise known as the 80/20 rule) almost works everywhere. It can help you in navigating a codebase, too.

Instead of tinkering with the codebase randomly, ask your colleagues with deep experience of the codebase about the files and folders they use almost all the time. You could focus on these files and folders, and from there you can move onto others as required by the tasks you're given. Then check other more critical files that can help you understand how the codebase is glued together such as:

  • Config files
  • ️Folder structures
  • ️Test files (if applicable)

It is important to read these parts of the codebase because they reveal important operations within it. Reading the config files and others can be really boring, so you don't have to understand it all at once. You can always revisit it later. That is the key.



Study the Workflows in the Codebase

Take your time to understand the workflow of the most important parts of the codebase.

Learn how this connects to that. Check what happens if you connect and disconnect this and that. By tracing the flow of operations within a codebase, you stand a chance to learn more about the codebase. This experience will help you act with precision when you're implementing features or fixing bugs within the codebase. Oh, wait! Are you wondering about how to do this? Okay, you can start with a function. Read it, then read and understand other functions or components that use it. You can repeat that process with modules, classes, and others until you have a solid understand of the codebase. You can also troubleshoot how the codebase handles requests and responses if applicable. Above all, find how everything is connected to understand the codebase.

Research the Libraries and Frameworks

You're likely going to find hardcoded code, libraries, and frameworks (internal or external) within a legacy codebase. The libraries and frameworks might not be in the mainstream yet. So, you'll need to research them and figure out how they're used, especially as required by your codebase. You can do this by googling their versions.Sometimes, the libraries might be designed within your organization. In that case, you'll need to seek for supports from colleagues who understand the context of the frameworks and libraries.Honestly, it can be hard to ask for help if the helpers are now your subordinates. But the truth is, asking for help from them doesn't mean you're not competent to get things done. Remember, they have context of the codebase. They wrote the code, so they're responsible for helping you understand the codebase and why they made their decisions. You don't have to feel bad about seeking help in this case. Even if you feel bad about it, it is okay to ask for help.

Understand the Hardcoded Code

By now you would have heard it or experienced it yourself that some code can't be touched though they seem not to do anything. From experience, I have learned that this kind of code is basically control code or mathematical expressions. This is what I mean: A piece of code may do nothing, but another part of the code is checking if it is available before making a decision. So what would happen if you remove the code that is checked by the other part of the code? Well, things will break or you get unexpected results. Most of the time, hardcoded code is just mathematical expressions that is not known by developers dealing with the code.

That reminds me of what happened to me recently. I was building a JavaScript package to convert GitHub to a Serverless Database. The structures of the data I had at hand dictated that I should use nested for loops, but I couldn’t do so because browsers don’t have the capacity to run complex operations like the server. So, I decided to come up with some mathematical expressions that made it possible to achieve all I wanted without nested loops. I knew the expression would appear hardcoded to other developers, but it got the job done without losing speed.

Anyway, I added a context to it – I explained what it does, how it does it, and why I chose to do it that way. All I am saying, in essence, is that most hardcoded code is control flows or mathematical expressions unknown to the developers working on a codebase. Knowing this will set you on the right track whenever you have to deal with hardcoded code.

Extend First, and Refactor Slowly

The first instinct we do have as developers when we see a legacy codebase is to rewrite or refactor it. But we always forget that extending it should be the first thing because it keeps the business going – it achieves the interests of business leaders.

By extending a legacy codebase, I mean using its APIs to build new features. But we have to make sure whatever features we add don’t have the bad traits we see in legacy codebases. Yes, I know, it is easier said than done. Sometimes, circumstances will force you to repeat that bad trait you hate. Yolo! You’re not alone. Also, you need to refactor slowly. By this, I mean you shouldn't rush to refactor a legacy codebase. Be patient until you understand the codebase and its contexts. Extend first, and refactor slowly.

Document Your Journey to Understand a Legacy Codebase

If your organization appreciates process and empathy, it is good to document your journey as you begin to understand the codebase – from setting it up to working through every part of it.

You might improve your company’s onboarding process if the path to setup your codebases and understanding them is clearly documented. At the same time, it will make life easier for the people coming after you and even help the people before you or your future self. Document everything, including possible challenges and how to fix them. Don’t forget to encourage others to improve your documentation to make things easier for others as it has been done for them.

Doing this may present you as a leader and get you some leadership opportunities. Anyway, don’t force it. Do so only if it is allowed in your organization or you know how to help them adopt it.


Work effectively on a large codebase

As a codebase becomes larger, it is getting harder to understand everything. I spent a tremendous amount of time before on either open sourced projects or proprietary codebases, oh sorry, I don’t mean “spent”, I mean wasted. So the point of this doc is to give you some of my thoughts on how to explore a codebase.

Curiosity killed a cat. I am not saying that you shouldn’t have curiosity to explore a codebase, on the contrary, you definitely should, but I want to warn you that an overly excessive amount curiosity might easily destroy your productivity.

  1. Realize that you can only understand a small fraction of the codebase. If a codebase contains 1 million lines of code, it will take you 100 days to read everything assuming that you are able to read 10 thousand lines of code per day (btw, this is a lot of code). It was a game changer for me to realize that I simply can’t understand everything. Lots of junior engineers are ambitious and want to read everything. The spirit is good; however, no, you simply can’t, period. It is important to prioritize what code you want to understand, and what code to skip.
  2. Realize that it takes time (probably months) to understand even a small faction of a codebase. If you just simply read the code that you rarely use, you probably are reading the code in an inefficient fashion. Patience is your best friend, and if you want to explore something, try to find opportunities to work on it directly instead of hiding in a corner to read the code. Don’t be too greedy and let your work guide you to the right path of understanding the codebase.
  3. Realize that code is not everything. Code is an imperfect product which lacks a huge amount of information itself. People are working for decades to improve it, and no matter how good the developers are, truth is that often people write obscure code. The context (including motivation, implicit assumptions, design, even mistakes) is often not presented in the code directly, yet they are essential parts of demystifying the code. People often underestimate how much context they need before read the code. If you are interested in solving puzzles, find a magazine and solve the sudoku puzzle in it. Don’t waste your time on the puzzles of reading code if you don’t have the right context to understand.
  4. Have a clear goal when you read code and stop reading the code if you reach your goal. Your goal shouldn’t be “I want to understand everything”. Your goal could be “I want to sharpen my skills by reading this piece of code” or “I don’t know how this particular feature is implemented, and I want to know whether the implementation is useful for my project”. You can explore code without a clear goal, but you shouldn’t spend unlimited time to read code without a clear goal.

The above points help me a lot to reshape my behaviors on reading code. My old habit of reading code was really bad: I was the type of person who would try to understand everything and always found myself uncomfortable if I used code that I hadn’t read yet. I spent more time reading code than working on the tasks. My output was dissatisfactory and I blamed myself for not knowing enough amount of code. I was totally wrong! When I started to think about how many lines of code have a direct impact on my work. I started to realize that the value of reading lots of code is not as significant to me as I originally expected. Being comfortable of working with APIs without worrying about underlying implementations dramatically improves my productivity: It helps me to focus on the things that I want to build and reduce the amount of time to read the code that are irrelevant to my work (though some of the code are used in my work).

I am not discouraging you to read the code, however, the right expectation of how reading code will help you. A bad behavior of reading code is extremely dangerous, but if you are able to cultivate a good set of behaviors, it helps you a really long way. So what behaviors are good in my mind?

  1. Be friend with other engineers. Context matters and context are often missing in the code. Sit down with them during lunch and ask them some insightful questions. Shut up if you realize that you are asking dumb questions (the questions that you can easily answer yourself by reading the code and docs) and go back to read documentations and code. The context of the code is as important as the code itself, and it is often kept in the brain of engineers, not directly reflected in the code. Have a good relationship with other engineers and make sure that they are happy to answer your questions. On the other side, you should also spend time to answer questions, it is part of your work to explain what’s going on. Don’t be annoyed by questions and be respectful to the people who ask you questions, because they are the people who make sure you are valuable to the company.
  2. Understand the basics and the architecture first. You probably need to understand how container technology works before you start reading Docker implementation. You probably need to understand how service oriented architecture works before you start reading any concrete service implementation. If you can’t understand the code, stop and ask yourself whether you are missing some basic information or not. If you feel like banging your brain to the wall when you are reading some code, you probably don’t have the right information yet to really understand the code.
  3. Develop a taste of good code and bad code. Don’t blindly follow the code in a codebase as the golden standard. Expose yourself to high quality code and avoid reading bad code. Any codebase may contain many bad code that you shouldn’t follow. For me, I think good code are the ones that contain proper names, implemented in a straightforward way and are fairly easy to read. I really hate the code that works but very hard to read, and yes they are “smart”, but they are also puzzles.
  4. Don’t just read code, find opportunities to work on it. Ask the owner of the code whether there are some tasks you can pick up. It is a good forcing function for you to fully understand the code. If there is no such task, there are plenty of stuff you can do. Does the code you read have enough test coverage? If not, add some tests. Does some documentation help other people read the code? If so, add some documentation. Do you find a better way to implement code? If so, refactor the code.
  5. Online materials. Luckily, there are many people online post their tips and tricks to read code. In case you haven’t read them yet. Here is a list of pages that I find out to be useful.

Tips for reading code

What are good ways to rapidly become familiar with a large codebase?

A good understanding of a codebase, including all the quirks and pitfalls, will definitely help you advance your impact, skills and career in the long run: your code will be more consistent with the codebase, you will debug issues more quickly, your code will contain less bugs, you will find more opportunities to build impactful projects by taking advantage of in house technologies, so on and so forth. It just requires a little more time and a little more patience.


To view or add a comment, sign in

More articles by Nika Germanishvili

Insights from the community

Others also viewed

Explore topics