Software Gardening: How Data Grows Roots
Summary
When designing complex software systems with multiple integration points and subsystems, where should your data live? Can't you just store data pretty much anywhere in the system? After all, data is cheap, right?
NO! STOP! DON'T DO IT! While data-storage may often be inexpensive, the data itself is a different story. Data can be compared to the roots of a tree - once you plant it, migrating it to a different location grows more and more difficult with time. This article discusses this phenomenon as well as ways to plan ahead for what I call "data-rooting".
Real-Life Example
I was recently asked by a friend and colleague for some architectural advise on a major software project for his firm (large enterprise with >37K employees). The project will have high visibility and high impact. They have already scheduled and planned an event to publicly announce the availability of this new product. That event is just a few months away. Development hasn't started. In fact, the development team has not been selected yet. So with a few months to build an entirely new software system, he was hoping for some guidance on how to get things going.
Especially for such a large enterprise, this prevents a very enticing challenge! While a couple of scrappy tech founders might be able to pull off a rough beta version in that time-frame, there are very few large companies that can move quickly enough to successfully build out an entirely new software-backed product with that short of a runway.
The basic rundown of the system is a front-end system that needs to connect with two different back-end systems (one existing and another TBD), but with some additional functionality. He had received some pretty high bids from development shops and also had some options for in-house development. He was hoping for some advice on what to choose between either developing in-house or going with one of the dev shops. Ultimately, he figured that once the initial product was launched, he could re-assess and rebuild if necessary.
This led to some great discussions on the portability of software vs. data. Among many other points and discussions, I strongly recommended that he do all that he can to build his MVP around the existing back-end system's data extension capabilities, as that is something they already plan to use long-term, and to shove as much data as possible directly into that system, even if it means building some custom data extensions on that system. At all costs, I cautioned to avoid allowing the new front-facing system to store data. Why? Because data grows roots, and it's hard to move trees.
Data Roots vs Software Soil
Imagine you want to plant a new tree in your front yard. You pick the tree, prepare the soil, dig the hole, drop in the tree, and nourish/tend/etc. as the tree establishes itself. When you plant your tree, are you thinking about this being a temporary spot? When you dropped it into the hole, did you say to yourself, "if this spot doesn't work out, no worries, we can just dig it up next summer and move it." Probably not. Why? Because trees have roots and the longer they grow, the more difficult it becomes to transplant the tree. What about the soil? Well, you have to keep enriching and nourishing the soil around the tree, and soil is pretty portable - meaning you can use a shovel and wagon and move it around and it will still provide the same benefits and there is very little risk of losing/damaging the soil in the process.
Data in your system is like the roots of a tree and the software around the data is like the soil. You'll need to keep replacing / replenishing / nourishing the software over time, but it may also be replaced; but data needs to be migrated if you want to change things up. You don't simply replace the data if you end up not being happy with the solution you selected. For this reason, you need to be very strategic in where you place your data within your system, and what software will control access to this data.
Back to Our Example
So in the case of my friend, if he were to hire a development shop to begin building the front-end system, they would most likely also drop in a database and begin plugging data in and out of that database as it coordinates with the back-end systems. This is fine if you are in it for the long-haul and don't plan on moving things, but given that he wants to keep his options open for possibly changing out the software front-end in the future, I recommended that he instead utilize the existing back-end system for his data storage so that he continues to have control of the "roots" of the system. This would make it much easier to replace the software soil, without needing to migrate the data-roots. As it turned out, the existing back-end system indeed had a way to extend the data structure to store custom data, as required by his desired features.
Of course, it is not at all impossible to migrate data from one technology, platform, or structure, to another. I've been involved in countless data migration projects; but if you can avoid it with more careful planning, isn't that better? Or if you can mitigate the impact through some option-based strategies, won't that simplify things later? It sure will.
Like a tree, data wants to live and grow. It's always worth a little hard-thinking to ensure you have picked the right plot of soil first.
Great metaphor! I actually get it now LOL
Thanks Pat Papapetrou for sharing this article.