Sosnoff’s Law and Misnomers of Data eco-system

Sosnoff’s Law and Misnomers of Data eco-system

Introduction

Martin Sosnoff in his pivotal book “Humble on Wall Street”, published in 1975, states that more the thick of the research file of analysts of a particular stock, the lesser the stock responds. In essence he concludes that it is worthwhile doing research on a topic at an optimal level, instead of getting into such granular details that it trivializes the whole work, eventually leading to poor use of its purpose.

Application of the Law and beyond

One may wonder what Sosnoff’s research would do with data eco-system. Taking a leaf out of the same concept in terms of research work and apply in data work will take us to conclude that data is the essential of an AI system while too much of it spoils the whole story on which the AI product is built upon.

It can be proven based on Machine Learning as well, wherein the “overfitting” of the model results in the model performing poorly in production scenario. So, it leaves us with a lean “dataset” that is considered “Golden”. The practitioner of ML/AI should then rely on the Golden dataset that just serves the purpose.

This article bases its premise on the importance of the usage of data assets in general and how the terminologies are consistently used to exchange information seamlessly. Upon looking at modern data stack and how that can be defined, all modern Data Platforms per se, use “Data Mesh” framework as the foundation.

Data Mesh Foundations

In traditional application based eco-systems, data is always “comprised” as part of the application modules that gives meaning to it. Hence, the needed focus was not given to Data as an asset. In essence, it led to heavy use of applications while most of the underlying data “assets” (Data was not considered as an asset in the previous era!!) went unused. While continuous research was carried out by various teams who are required to improve their business functions by way of Operations Research (OR), Quality Control and Lean Engineering. All these were carried out in isolation by extracting data using some tools publicly available, while the findings were reported in easily accessible reporting tools. Though Data warehousing as a concept has been there for a while, it still doesn’t entertain to treat its own core – the Data itself – as an asset. This is partially to do with the infrastructure as well in the realm of servers and localized computers.

So, entire eco-system lacked not just the availability of resources, but also a definitive framework that enables Data to be treated as an asset. To be treated as an asset, Data must be managed like a Product (not just an IT product). In a typical scenario, a product should offer itself below basic aspects:

-              Ownership

-              Self-service

-              Collaboration

-              Self-sufficient

The reader can see from the above diagram that Data Mesh framework allows to manage Data in a way that serves as a product.

Misnomers

The question one can then ask is whether a fully matured atomic unit of an object from Data Mesh eco-system can be considered a Product. A whole-hearted answer would be Yes. So, if we say a data unit as a product, how that should be built, updated, consumed, and maintained and then such a data element is developed what would be its structure. Answer to these questions lies in a typical product definition itself. As we know, a product is one that is developed to fulfill a need. In a granular fashion, here the data is the backbone to enable de-duplicated analysis and reporting. The reader may be puzzled to ask the question of what is then data to do with AI. Simply, AI in its native form is an evolution of Data Analysis. So, combining all these aspects compels us to reach a stage where data is not just used for reporting past data or for simple forecasting but for building complex systems based on AI that can improve lives by bringing in value.

So, if we make a product out of simply data itself, how do we then call it. In her book “Data Mesh”, Zhamak Dehghani articulates it as “Data Product”.

There are practitioners of Data Science and Data in general call the same “Data Product” instead of “Digital Product”, which is typically a system that has been built on top of a “Data Product” to provide solution to a real-life issue. Typically, since a “Digital Product” is visible to users, it is seen as the “Data Product”. So, why should we distinguish them? If Data Product is called so, it gives a lot of meaning for the Data to be treated as an asset and that forces an organization to maintain it as per the Data Mesh Framework.

Forcing an eco-system to maintain Data as the centerpiece decouples many aspects of Product development and eases implementing micro-service-based architecture a lot easier.

Few decadal proponents of IT systems may then feel that they are left out of the trend based on Data because they have their own IT applications, which they call a Product. While an application can become a Product, it must possess the self-encompassing features of a Product for it to be one!!

Conclusion

While the intent of this article is to introduce the ambiguities surrounding modern Data systems, it also introduces Data Mesh concept and highlights the importance of the due-treatment data-ecosystem requires. All such these things are required to simplify one thing at the high-level – Bring in Value, that is to improve something!

References:

  • Data Mesh by Zhamak Dehghani
  • 100 Baggers by Christopher Mayer

To view or add a comment, sign in

More articles by Solaimalai Srinivasan

Insights from the community

Others also viewed

Explore topics