Forrester changed the way they think about data catalogs. Here’s what you need to know.

Prukalpa ⚡

Co-Founder at Atlan – Home for Data Teams | Forbes30 & Fortune40 lists | TED Speaker

Published Oct 7, 2022

As we predicted at the beginning of this year, metadata is hot in 2022 — and it’s only getting hotter. But this isn’t the old-school idea of metadata we all know and hate.

The data industry is in the middle of a fundamental shift in how we think about metadata. Now, in the latest sign of this shift, Forrester scrapped its Wave report on “Machine Learning Data Catalogs” to make way for one on “Enterprise Data Catalogs for DataOps”.

Here’s what you need to know about where this change came from, why it happened, and what it means for modern metadata.

✨ Spotlight: What are Enterprise Data Catalogs for DataOps, and why should you care?

One of the biggest challenges with Data Catalog 2.0 was adoption — no matter how it was set up, companies found that people rarely used their expensive data catalog. For a while, the data world thought that machine learning was the solution. That’s why, until recently, Forrester’s reports focused on evaluating machine learning data catalogs.

However, in early 2022, Forrester dropped machine learning in its Now Tech report. It explained that even as ML-based systems became ubiquitous, the problems they were meant to solve persisted. Although machine learning allowed data architects to get a clearer picture of the data within their organization, it didn’t fully address modern challenges around data management and provisioning.

The key change — ”Data engineers need a data catalog that does more than generate a wiki about data and metadata”. Instead, data teams need a catalog built to enable DataOps. This requires in-depth information about and control over their data to “build data-driven applications and address data flow and performance”.

So what actually is an enterprise data catalog for DataOps (EDC)? According to Forrester, “[enterprise] data catalogs create data transparency and enable data engineers to implement DataOps activities that develop, coordinate, and orchestrate the provisioning of data policies and controls and manage the data and analytics product portfolio.”

There are three key ideas that distinguish EDCs from the earlier Machine Learning Data Catalogs.

Handles the diversity and granularity of modern data and metadata

Today a company’s data isn’t just simple tables and charts. It’s a wide range of data products and associated assets, such as databases, pipelines, services, policies, code, and models — each with its own metadata. EDCs are built for this complex portfolio of data and metadata.

Rather than just storing a “wiki” of this data, EDCs act as a “system of record” to automatically capture and manage all of a company’s data through the data product lifecycle. This includes syncing context and enabling delivery across data engineers, data scientists, and application developers.

Provides deep transparency into data flow and delivery

A key idea in DataOps is CI/CD, a software engineering principle to improve collaboration, productivity, and speed through continuous integration and delivery. For data, implementing CI/CD practices rely on understanding exactly how data is moved and transformed across the company.

EDCs provide granular data visibility and governance with features like column-level lineage, impact analysis, root cause analysis, and data policy compliance. These should be programmatic, rather than manual, with automated flags, alerts, and/or suggestions to help users keep on top of complex, fast-moving data flows.

The future of metadata is active ⚡️

All of these ideas — from Forrester’s championing data catalogs for DataOps to Gartner scrapping its Magic Quadrant for Metadata Solutions — point to the importance of active metadata. We first wrote about this idea in January 2021, and we’ve seen it explode since then.

From DataOps to the data mesh, modern data concepts are fundamentally based on being able to collect, store, and analyze metadata. However, data catalogs lagged behind for years, acting as static, siloed systems in a world of fast-moving, interconnected data. In a world where metadata is approaching “big data” and it is critical for a range of modern use cases, the standard way of storing metadata is no longer enough. As Forrester said, we need more than a wiki for our data.

The solution is “active metadata”, which is a key component of modern data catalogs. Instead of just collecting metadata from the rest of the data stack and bringing it back into a passive data catalog, active metadata makes a two-way movement of metadata possible. It sends enriched metadata and unified context back into every tool in the data stack, and enables powerful programmatic use cases through automation.

Here are a few examples of what active metadata looks like in action:

Purge stale or unused assets: Use active metadata to periodically calculate when each data asset was last used and how many people used it, and then flag or purge neglected assets.
Allocate compute resources dynamically: Imagine that 90% of users log in to a BI tool during the last week of a financial quarter — automatically scale up compute resources just before that week and scale them down again afterward.
Enrich user experience in BI tools: Instead of making business users switch between a BI tool and data catalog, push important metadata (like business terms, descriptions, owners, and lineage) directly into the BI tool.
Notify downstream consumers: Check data pipelines for issues when a data store changes and notify downstream data users about potential breaking changes (e.g. the addition or removal of a column).

Learn more about active metadata here. ➡️

📚 More from my reading list

From Business Problem To Data Science Experiment by Vin Vashishta
What is Data Engineering Part 1 and Part 2 by Gergely Orosz
What We Are Missing in Data CI/CD Pipelines? by Ivan
Why Does Self-Service BI Fail and What Could Enterprises Do to Turn the Tide? by Anh Tran
What Open Source Can Do For Your Data Career by Mehdi Ouazza

I’ve also added some more resources to my data stack reading list. If you haven’t checked out the list yet, you can find and bookmark it here.

See you next week!

P.S. Liked reading this edition of the newsletter? We'd love it if you could take a moment and share it with your friends on social.

Metadata Weekly

9,415 followers

+ Subscribe

Nathan Greenhut

Strategic Accounts Leader at Hopsworks | Driving Revenue Growth through Innovative Solutions

This change makes a lot of sense. Thank you Prukalpa ⚡ for sharing. Enterprise data catalogues and intelligent metadata help companies to streamline and gain productivity and insights quicker. There is always a need for this and a thirst for this no matter what company and size from what I have seen over the past 20 or more years. I don’t think this will change. I think the difference going forward is how quickly companies can adapt to change, given environment, economic and global political pressures speeding up their cycles of ups and downs. Your work at Atlan is impressive and I highly commend you and your team’s efforts and success!

1 Reaction

To view or add a comment, sign in

See all

Forrester changed the way they think about data catalogs. Here’s what you need to know.

Prukalpa ⚡

Co-Founder at Atlan – Home for Data Teams | Forbes30 & Fortune40 lists | TED Speaker

✨ Spotlight: What are Enterprise Data Catalogs for DataOps, and why should you care?

Handles the diversity and granularity of modern data and metadata

Provides deep transparency into data flow and delivery

Recommended by LinkedIn

Designed around modern DataOps and engineering best practices

The future of metadata is active ⚡️

📚 More from my reading list

Metadata Weekly

9,415 followers

More articles by this author

Insights from the community

Others also viewed

The Semantic Layer in the Modern Data Stack

Addressing DBMS Innovation Stagnation with Hyperlinks as Super Keys

Spotlight on Scalability: How PromptCloud Handles Your Growing Data Needs - In Conversation with Data Engineer Lead

Why 2022 Will Be the Year of Data Observability

Come Hell or High Water: Some Lessons from Four Years of Data Mesh Implementations Learned the Hard Way: Lesson One

The anatomy of an active metadata platform, bringing data analysts to the table, mapping data journey with column lineage, and more

Data Mesh, Data as a Product, and Active Metadata

Intense Competition in Data Analytics Space

Data Transformation 101 - Unlock the True Potential of Your Data with Microsoft Fabric

Charting the Data Landscape: A Tale of Technological Triumphs and Trials

Explore topics

✨ Spotlight: What are Enterprise Data Catalogs for DataOps, and why should you care?

Handles the diversity and granularity of modern data and metadata

Provides deep transparency into data flow and delivery

Recommended by LinkedIn

Designed around modern DataOps and engineering best practices

The future of metadata is active ⚡️

📚 More from my reading list

Metadata Weekly

9,415 followers

How to craft the ultimate business case for data governance - Part 2

Nov 1, 2024

How to craft the ultimate business case for data governance - Part 1

Sep 12, 2024

How companies are making Forrester’s idea of modern data cataloging a reality

Aug 30, 2024

What the recent Forrester Wave means for data catalogs

Aug 14, 2024

The War of the Catalogs

Aug 2, 2024

3-step framework for scaling data quality in the age of generative AI

Jul 18, 2024

4 practical lessons from data governance leaders at Dropbox, General Motors, and Patagonia

May 30, 2024

Why data governance fails in today’s AI world

May 13, 2024

A Shared Language for Enterprise Data ✨

Aug 4, 2023

Modernizing Data Stack ✨

Jun 29, 2023

Insights from the community

Others also viewed

The Semantic Layer in the Modern Data Stack

Addressing DBMS Innovation Stagnation with Hyperlinks as Super Keys

Spotlight on Scalability: How PromptCloud Handles Your Growing Data Needs - In Conversation with Data Engineer Lead

Why 2022 Will Be the Year of Data Observability

Come Hell or High Water: Some Lessons from Four Years of Data Mesh Implementations Learned the Hard Way: Lesson One

The anatomy of an active metadata platform, bringing data analysts to the table, mapping data journey with column lineage, and more

Data Mesh, Data as a Product, and Active Metadata

Intense Competition in Data Analytics Space

Data Transformation 101 - Unlock the True Potential of Your Data with Microsoft Fabric

Charting the Data Landscape: A Tale of Technological Triumphs and Trials

Explore topics