Is this a new era for dbt Labs?

Is this a new era for dbt Labs?



I still remember the first time I came across dbt.  It was sometime in mid-2019 and like most data consulting companies then, our work was mostly code-based ETL pipelines using lower-level cloud infrastructure as you could find in most startup tech engineering blogs. Code-based ETL before dbt required experienced data engineers. As a relatively small consultancy in a very hot data engineering market, we found it hard to scale our team without a different approach. Also, most of our team's background was not in engineering at the time. Our CTO was adamant about proper software engineering best practices with data, drawing from his experience working with big-data pipelines in his last work. After reading some of Tristan Handy's blog posts about Analytics Engineering and dbt, it became clear to us that dbt could be this missing piece that would enable our analysts to work like engineers or to put it simply, to become analytics engineers.

In hindsight, the geniality about the early versions of dbt (it was just called dbt back then, but now it would be dbt-core for clarity) was not the complexity of its code or features, but rather its simplicity. Most legacy ETL tools such as Informatica or Pentaho that cater to non-engineering professionals were clunky, full of distracting features, and worst, had almost 0% coverage of any SWE best practices that are a must for modern data work. On the other hand, working with modern data platforms such as Snowflake and Databricks requires much deeper technical knowledge than any typical data analyst, making it a data engineer-only realm. That meant that for most companies, despite being able to build data pipeline orders of magnitudes faster than the previous technology allowed, there was a real constraint on how to scale the data org since there were so few professionals that could work on it. Worse, many data engineers dislike talking to business users or even writing SQL queries at all, so the data organization was kept inevitably far from the lines of business where the business value of data lives.

Despite dbt being still in its early stages, in a few months, we built an entirely new analytics engineering practice on top of it, made up of professionals without a software engineering background but with very good analytical skills. To accelerate that movement, we developed our analytics engineering course, open to the public, and that has since trained more than 1000 analytics engineers who work for Indicium, our customers, or in multiple other companies. To date, we are among the top certified partners of dbt worldwide. There is no doubt dbt is a big thing for any modern data team.

But what about dbt Cloud? For many early adopters like us, dbt-core was already good enough for our work. Also, many features launched with the first versions of dbt Cloud were already developed by our platform teams or the open-source community. Until recently,  there was little value for us to move to the Cloud. And don't get me wrong, a lot of those features are needed by dbt Cloud to be a good tool in itself. The problem for dbt labs was that for many companies adopting dbt, as they left Plato's cave of modern data stack ignorance, there were so many possibilities to improve their data platform best practices with dbt that most platform teams became advanced users of dbt, which IMO, was not the main user persona of dbt Cloud. But then, who is?

I believe that there are three main personas for dbt Cloud: a) companies that are born into the modern data stack and don't have/don't want to keep a large data team, b) enterprise companies that want to scale their dbt core implementation into the lines of businesses and want a tool that can let them implement data management and data governance best practices while keeping the complexity low for less technical LOB analytics teams and c), companies that are relatively late in adopting a cloud data warehouse and are just now migrating away from legacy data tech, such as Talend and Informatica.  Until now, it wasn't always compelling enough for some of these personas to adopt and implement dbt Cloud.  So why do I think that will change?

I think the new announcements from dbt labs in this year's Coalesce are all in the right direction. First, dbt Labs is acknowledging that it has to do more than just the data transformation part if it is to be the single data tool for smaller organizations and/or other companies without a dedicated data platform team. Features like orchestration, data cataloging, or even Data ingestion are all necessary.  They all currently need a set of different tools that may be hard to combine and also expensive. The vision of dbt becoming a data control plane is good and goes in tandem with the consolidation trend we at Indicium have seen in the modern data stack space in the past few years.

Arguably, the biggest announcement of Coalesce was the One dbt strategy. First, there is real value in a hybrid approach of dbt core and Cloud, with the first being developed by platform or CoE-style teams, and the latter focused on less technical LOB teams. A first-class experience for this hybrid approach in dbt Cloud is a must-have for many of our enterprise customers. Second, while most advanced features of dbt Cloud had already been developed internally by dbt power users, this is not the case for hybrid cloud and data mesh architectures. No single tool or platform can deal with this ever more common practice in the enterprise, even when using the same cloud provider (e.g Databricks + Snowflake platforms). With Iceberg becoming the de facto standard for modern data storage, there is a real opportunity for dbt to become the missing piece between those data platforms, allowing teams to develop their tools without losing governance and DataOps best practices. Finally, while there is a long-time conundrum between code-based and no-code/low-code development for data transformation, this is a must-have feature for less technically minded engineers and a common requirement for enterprises. Having this feature inside dbt Cloud and integrated with the dbt development lifecycle is a good move by dbt Labs.

I'm confident that dbt is the most ubiquitous tool of the modern data platform. More than just a tool, dbt allowed companies to close the gap between business and data with the rise of the Analytics Engineering role. For dbt Labs, ironically, its dbt Cloud product suffered from the qualities of its original product. While there were always companies where dbt Cloud was the best fit, a large part of the market found it hard to identify where dbt Core was lacking. With the new strategy and release announcements,  dbt Cloud is solving real technical and serving business needs that dbt Core cannot serve and I can see more and more use cases where dbt Cloud provides a compelling advantage over running dbt Core.

Isabela Blasi

Co-founder and CBDO | Indicium

2mo

on point 🎯

Like
Reply
Lucas Souza

Head of Marketing @ Indicium | Digital | Branding | Product | Growth | Field | B2B | Modern Data Stack | AI & GenAI

2mo

Great content!

Like
Reply

To view or add a comment, sign in

More articles by Daniel Avancini

  • Precisamos falar sobre notebooks

    Precisamos falar sobre notebooks

    Existe uma máxima no mundo de tecnologia (e na vida como um todo, para ser sincero): tem coisas que eu posso, mas não…

    6 Comments
  • A Tale of two summits (and one city)

    A Tale of two summits (and one city)

    ”It was the best of times, it was the worst of times" In the past two weeks, Snowflake and Databricks held their annual…

Insights from the community

Others also viewed

Explore topics