Everything Everywhere all at Once
Image from Pixabay

Everything Everywhere all at Once

Daniel Kwan and Daniel Scheinert (The Daniels) will forgive me for misappropriating the title of their film, which won seven Academy Awards in 2023, but the analogy to what we now expect from a data management solution stuck out to me so suddenly, that I was able to resist.

Of course I'm not talking about the presence of parallel universes, at least not in the sense in which it is normally done[1], but about the fact that, to an ever greater ubiquity and heterogeneity of data, must correspond a single access point - a unicum - where, precisely, everything (all at once) must be knowable, searchable, understandable and usable, regardless of what each data represents (everything) and from where each data is physically located (everywhere).

If the world with which every company is confronted is fluid and dynamic, subject to even sudden changes, digitally observed through data, it is in fact essential to match an arbitrary distribution of data, in the sense of being kept in different systems and places , a single conceptualization of them, which is easily accessible and allows those who have to deal with this world and make decisions on it through data, to act in a conscious way, through a correct understanding of how this world is represented, in so as to bridge that gap between looking and seeing, i.e. between a physical act (the collection of data) and a logical one (the transformation of data into information), which if not filled would make the collection of data a completely ephemeral act, devoid of value, which would commit resources without this commitment being then repaid by the generation of value.

However, it is important to distinguish between the need for centralization of the representation and acknowledgment of the geographical and technological distribution of the data that are the embodiment of this representation. In other words, a request - perhaps an obligation - for centralization of the intensional component can and must be accompanied by total freedom for the extensional one, for which it is unreasonable to expect an analogous centralization, which would not only consume a non-trivial amount of resources, economic and temporal, not only would it be an enemy of the environment due to the energy consumed, but above all it would add nothing to what must be the primary objective of those who must consume the data, i.e. the simplicity in passing from the idea, from the objective or task, to their realization, through the data.

This new way of using data, which might seem counterintuitive to us - but I fear only because we have long been used to operating differently - which is based on the logical-physical (or intensional-extensional) separation of data - it is no coincidence that we speak of Logical Data Management – began with the acknowledgment that understanding and use are two distinct moments, where the first is satisfied by the ease of access to the intensional component (the meaning of the data), while the second, subsequent, will depend on the first, from what has been chosen and any preventive action, aimed at physically collecting the data in one place, would, as already said, be a useless commitment, because it could collect data that no one will ever use, because in doing so a temporal latency would be introduced which it could reduce the effectiveness of what must be derived from the data and why, finally, why duplicating the data would create potential new points of attack by those who want to make illicit use of that data.

There is a famous saying that reminds us to "don't put off until tomorrow what you can do today"; here, in the spirit of logical architectures for Data Management, it is instead a matter of postponing, of not doing immediately what, after all, has not yet been requested - collecting the data in advance - but of doing it only when the data has been chosen and it is therefore necessary to proceed with their effective use. Naturally, not everything is postponed, and what must be done today, always according to a logical approach, is to collect the meaning of the data, gathering it in a single point, giving it a clear representation, facilitating its access, enriching it with all those elements that allow for easy exploration.

Wanting to make a further analogy, we can think of logical architectures for data management as a library that does not physically possess any books, but which centralizes only the catalogue, on which the reader can choose the publications he needs, which then, thanks to a very efficient logistics network, will be taken from the various sources – other libraries, intermediate warehouses, publishers – and delivered to those who have requested them, the receiving subject who will not notice anything, who will not have limited his prerogatives of being able to read any work, thanks precisely to the complete decoupling between the logical and physical components.

This further analogy should make other benefits evident, given that such a library will not need space to accommodate the publications, will not have the problem of having outdated editions and, above all, will not have to purchase publications in advance that no one will ever request. All these elements, specifically the first, are also consistent with environmental objectives, which are increasingly at the center of interest, given that each additional data collection point will consume energy and, consequently, emit CO2.

In conclusion, the time has come to acknowledge that any data management solution must have as a reference and before anyone else, the Data Consumer, i.e. the one who will use them, just as every car is now built thinking of its driver and not to the mechanic who will eventually have to repair it, which does not mean ignoring the latter's needs, but only recognizing that they will in a certain sense be secondary, not in terms of importance, but in the sense that the design will have to be based on the requirements of the Data Consumer and then, addressing these, making sure they are achievable and manageable.

#dataintegration #datavirtualization #datapipe #dataops


[1] Even if, wanting to stretch the analogy a bit, we can think, for example, of the Data Domains of the Data Mesh paradigm as universes in which, independently of each other, the data move, interact and combine into what becomes an element of inter-universe communication: Data Products.


To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics