Sosnoff’s Law and Misnomers of Data eco-system
Introduction
Martin Sosnoff in his pivotal book “Humble on Wall Street”, published in 1975, states that more the thick of the research file of analysts of a particular stock, the lesser the stock responds. In essence he concludes that it is worthwhile doing research on a topic at an optimal level, instead of getting into such granular details that it trivializes the whole work, eventually leading to poor use of its purpose.
Application of the Law and beyond
One may wonder what Sosnoff’s research would do with data eco-system. Taking a leaf out of the same concept in terms of research work and apply in data work will take us to conclude that data is the essential of an AI system while too much of it spoils the whole story on which the AI product is built upon.
It can be proven based on Machine Learning as well, wherein the “overfitting” of the model results in the model performing poorly in production scenario. So, it leaves us with a lean “dataset” that is considered “Golden”. The practitioner of ML/AI should then rely on the Golden dataset that just serves the purpose.
This article bases its premise on the importance of the usage of data assets in general and how the terminologies are consistently used to exchange information seamlessly. Upon looking at modern data stack and how that can be defined, all modern Data Platforms per se, use “Data Mesh” framework as the foundation.
Data Mesh Foundations
In traditional application based eco-systems, data is always “comprised” as part of the application modules that gives meaning to it. Hence, the needed focus was not given to Data as an asset. In essence, it led to heavy use of applications while most of the underlying data “assets” (Data was not considered as an asset in the previous era!!) went unused. While continuous research was carried out by various teams who are required to improve their business functions by way of Operations Research (OR), Quality Control and Lean Engineering. All these were carried out in isolation by extracting data using some tools publicly available, while the findings were reported in easily accessible reporting tools. Though Data warehousing as a concept has been there for a while, it still doesn’t entertain to treat its own core – the Data itself – as an asset. This is partially to do with the infrastructure as well in the realm of servers and localized computers.
So, entire eco-system lacked not just the availability of resources, but also a definitive framework that enables Data to be treated as an asset. To be treated as an asset, Data must be managed like a Product (not just an IT product). In a typical scenario, a product should offer itself below basic aspects:
- Ownership
- Self-service
- Collaboration
- Self-sufficient
Recommended by LinkedIn
The reader can see from the above diagram that Data Mesh framework allows to manage Data in a way that serves as a product.
Misnomers
The question one can then ask is whether a fully matured atomic unit of an object from Data Mesh eco-system can be considered a Product. A whole-hearted answer would be Yes. So, if we say a data unit as a product, how that should be built, updated, consumed, and maintained and then such a data element is developed what would be its structure. Answer to these questions lies in a typical product definition itself. As we know, a product is one that is developed to fulfill a need. In a granular fashion, here the data is the backbone to enable de-duplicated analysis and reporting. The reader may be puzzled to ask the question of what is then data to do with AI. Simply, AI in its native form is an evolution of Data Analysis. So, combining all these aspects compels us to reach a stage where data is not just used for reporting past data or for simple forecasting but for building complex systems based on AI that can improve lives by bringing in value.
So, if we make a product out of simply data itself, how do we then call it. In her book “Data Mesh”, Zhamak Dehghani articulates it as “Data Product”.
There are practitioners of Data Science and Data in general call the same “Data Product” instead of “Digital Product”, which is typically a system that has been built on top of a “Data Product” to provide solution to a real-life issue. Typically, since a “Digital Product” is visible to users, it is seen as the “Data Product”. So, why should we distinguish them? If Data Product is called so, it gives a lot of meaning for the Data to be treated as an asset and that forces an organization to maintain it as per the Data Mesh Framework.
Forcing an eco-system to maintain Data as the centerpiece decouples many aspects of Product development and eases implementing micro-service-based architecture a lot easier.
Few decadal proponents of IT systems may then feel that they are left out of the trend based on Data because they have their own IT applications, which they call a Product. While an application can become a Product, it must possess the self-encompassing features of a Product for it to be one!!
Conclusion
While the intent of this article is to introduce the ambiguities surrounding modern Data systems, it also introduces Data Mesh concept and highlights the importance of the due-treatment data-ecosystem requires. All such these things are required to simplify one thing at the high-level – Bring in Value, that is to improve something!
References: