Data Mesh, Data as a Product, and Active Metadata
Welcome to this week's edition of the ✨ Metadata Weekly ✨ newsletter.
Every week I bring you my recommended reads and share my (meta?) thoughts on everything metadata! ✨ If you’re new here, subscribe to the newsletter and get the latest from the world of metadata and the modern data stack.
✨ Spotlight: Data Product Shipping Standards in a Data Mesh
Discoverable. Understandable. Trustworthy. These are just a few of the key ideas of a data mesh infrastructure. Go through all of them and you’ll quickly find a common element that’s key to achieving each one — metadata.
Metadata allows you to shift from siloed context to embedded context (domains), generalized experiences to personalized experiences (data products), minimum automation to truly autonomous (self-service infrastructure), and top-down governance to democratized governance (federated computation governance).
But what’s the first step to take this from theory to practice?
A core fundamental principle of the data mesh is the concept of “data as a product”. Zhamak Dehghani defined several core tenets that make data products “products” — for example, making them addressable and trustworthy.
I’ve been thinking a lot about what it takes to take these tenets from theory to practice.
For example, for “Understandable”, there’s the concept of data product shipping standards — metadata that should be attached to every data product (either programmatically or manually). I’ve seen a lot of success using the 5W1H framework to define what makes a data product understandable. As an organization, you can choose which of the elements in the framework matter most and focus on those.
The next step is automating the addition of this metadata wherever possible. This makes infrastructure truly self-service, a crucial part of the data mesh paradigm. For example, to make data products understandable, you can bring in context from across your data stack. Parsing SQL logs can be used to automatically rank the popularity of each data product at the column level. Context from data pipelines can be used to create column descriptions based on a data product’s source.
This step may require the implementation of new tools, but at the end of the day, the data mesh is a cultural and mindset shift. That’s why the last important step is to incorporate human-driven standards and rituals into your shipping process.
At the end of the day, you’re asking your engineers and developers to start thinking about their roles differently, and that’s not easy. It’s about cultural change, not just tech. Set your data team’s values (e.g. reusability), create rituals to help everyone achieve them (e.g. a Documentation Hour), and you’ll eventually see a real shift in people’s mindset and productivity.
I recently wrote a detailed blog about the metadata foundation that a data mesh needs.
Recommended by LinkedIn
📚 More from My Reading List
The recent developments in the modern data stack would have been a dream for me as a data leader just half a decade ago. The next decade is going to bring more innovation than ever before!
I enjoyed reading this article from Mario Hayashi on his predictions for CDPs (Customer Data Platforms) in 2030.
“Today, a lot of personalisation, if present at all, is based on rules-based logic that presents pre-written content when a certain event is triggered. The holy grail of personalisation is to extract insights in real-time and present the most relevant copy and interface to the customer. In the future, we’ll see part machine-generated copy and multivariate designs appear on customers’ screens as the CDP works out which stage of the customer journey they’re at.”
I am also rooting to bring more personalization to data products. This is not just relevant copy or interfaces, but instead about creating more contextualized experiences for every data user. Personalization will help drive the adoption of data products within organizations, and I believe that active metadata will be a driving force behind this.
I also read another great post from Eric Weber on Making Data Actionable. This is a great post on always taking the “user-first” lens while building data products. A lot of adoption comes down to reinforcing ideas, problem-solving with the end-user, and most importantly understanding what is it that people love (or do not love) about your product by actually speaking with them. Think about design, talk to users, get feedback more often, and make things simple, etc. – some fundamental but super important principles.
“The problem is not that the methods, tools and outcomes are not useful in general. We can talk all day about the cool things we build. The problem is that usefulness is in the eye of the person consuming the results. If they don’t think the product of our data science work is actionable, no level of “cool” will matter to the business. So the problem I see is that data needs to be actionable. Not just plausibly actionable but easily actionable.”
Some other articles I enjoyed reading:
I’ve also added some more resources to my data stack reading list. If you haven’t checked out the list yet, you can find and bookmark it here.
🎧 Data Mesh Playlist
Here’s a curated playlist for you to learn more about the data mesh. 🌪
If you’re new here, subscribe to this newsletter on Substack. I'll see you next week with some interesting stuff around the modern data stack. 👋
Liked reading this edition of the newsletter? Do share it with your friends (or colleagues) or on social.
VP Client Insights Analytics (Digital Data and Marketing) at Bank Of America, Data Driven Strategist, Innovation Advisory Council. Member at Vation Ventures. Opinions/Comments/Views stated in LinkedIn are solely mine.
2yVery informative, forward looking content. Liked the quote on how the customer data platforms will pay out. Insights provided by constant review of usage patterns and providing contextual metadata. Today was on a call related how such usage patterns can also be augmented by providing a data change radar. Looks like lot of activity going to happen in this space.