dbt's Latest and Greatest: A Recap from Coalesce 2024 (and a Few Hats) Data Analysts, Data Engineers, and even Business Users rejoice in this Blue Orange Digital article by Sebastian Freiman. Fresh off the heels of Coalesce 2024 in Vegas (hat collection significantly expanded), here's a breakdown of dbt's hottest new features and upcoming beta releases! Headlining: 🟠 Unit Tests for Quality Assurance: Like a code whisperer for your data models, unit tests ensure everything works as expected before materialization. 🟠 Microbatching for Efficiency: Tired of processing all your data at once? Microbatching tackles this by focusing on recent data, saving you storage and processing power. 🟠 Snapshot Improvements (Finally!): Snapshots are getting a refresh with YAML configuration and more control over column naming. Coming Soon (Beta): 🟠 Apache Iceberg Support: Multi-warehouse just got easier with seamless integration for a unified data experience. 🟠 Analytics Development Lifecycle (ADLC): A structured approach to data development for smoother workflows. 🟠 dbt One & Mesh: Simplifying large-scale data modeling with a focus on collaboration. 🟠 dbt Copilot & Control Plane: Supercharge your development with AI assistance and centralized governance. 🟠 Cost Optimization: Keep your data costs in check with built-in optimization tools. And there's more! This is just a taste of what dbt has in store. Stay tuned for a deeper dive into these features and how they can revolutionize your data game. https://lnkd.in/eXQij2jQ #dbt #Coalesce2024 #dataengineering #datamodeling
Blue Orange Digital’s Post
More Relevant Posts
-
Data contracts don’t just apply to source data This is one of the things that dbt Labs does really well: the v1.5 release introduced “contracts” and access groups to facilitate this Both have quite a lot of detail, but to summarise: • Contracts fail a model’s update if the update doesn’t conform to the defined schema • Access groups define which parts of your pipelines can reference each other, similar to the “public”, “protected”, and “private” concepts in software engineering I think it’s clear why you’d want to enforce a contract, and there have been solutions to this pre-dating dbt (like the write-audit-publish pattern) But why should you care about access? Defining what your consumers “can” and “can’t” look at is another aspect of your data contract Anything you “allow” your consumers to use is something that you’re committing to maintain in a consistent manner If your consumers are “allowed” to use any of the objects that you maintain, then refactoring becomes a pain — you either: • Make changes now and risk breaking things downstream • Engage with your downstream consumers and give them a grace period between warning of and then making your changes Alternatively, if you agree the scope of “public” objects with your consumers beforehand, you can freely update any of your “private”/"protected" objects whenever and however you want — as long as the “public” objects remain consistent Defining your access is beneficial for everyone: • It’s beneficial for you because you know which objects you can freely change without breaking things downstream — good for your developer velocity • It’s beneficial for your consumers because they’ll have a consistent and stable set of objects to consume from — good for keeping consumers happy Whether you’re using dbt or not, being clear about your data contracts throughout your entire pipeline will keep everyone happy #dbt #analyticsengineering #datacontracts
To view or add a comment, sign in
-
-
Last week, our engineering team built a custom dbt alerting tool that enriches alerts with actionable context. But how does it work, and how could you build something similar? Here are 4 of the most interesting challenges we tackled, along with how we solved them: 𝟭. 𝗦𝘂𝗽𝗽𝗼𝗿𝘁𝗶𝗻𝗴 𝘁𝗼𝗽-𝗹𝗲𝘃𝗲𝗹 𝗮𝗹𝗲𝗿𝘁 𝗿𝘂𝗹𝗲𝘀 𝗳𝗼𝗿 𝗮𝗻𝘆 𝗱𝗯𝘁 𝗽𝗿𝗼𝗷𝗲𝗰𝘁 Solution: Use a specifically named exposure that stores custom configuration data. This supports a wide range of dbt setups, avoids users having to update the 𝚖𝚎𝚝𝚊 property of each model, and does not require access to underlying git repos. 𝟮. 𝗗𝗲𝗳𝗶𝗻𝗶𝗻𝗴 𝗰𝗼𝗺𝗺𝗼𝗻 𝗮𝗹𝗲𝗿𝘁 𝗳𝗶𝗹𝘁𝗲𝗿𝗶𝗻𝗴 𝗿𝘂𝗹𝗲𝘀 The most expressive yet intuitive filters for us are database path matching, tag-based matching, and owner-based matching. These cover the three (mostly) orthogonal directions of data structure, domain knowledge, and responsibility. 𝟯. 𝗛𝗮𝗻𝗱𝗹𝗶𝗻𝗴 𝗺𝗼𝗱𝗲𝗹-𝘁𝗲𝘀𝘁 𝗿𝗲𝗹𝗮𝘁𝗶𝗼𝗻𝘀𝗵𝗶𝗽𝘀 Upon each test failure, we evaluate rules against both the test itself and the actual parent models that the tests are applied to, by traversing the many-to-one relationship on each failure. 𝟰. 𝗠𝗮𝗸𝗶𝗻𝗴 𝘁𝗲𝘀𝘁 𝗳𝗮𝗶𝗹𝘂𝗿𝗲𝘀 𝗲𝗮𝘀𝘆 𝘁𝗼 𝗮𝗰𝗰𝗲𝘀𝘀 We created a macro to download failing rows captured by dbt via 𝚜𝚝𝚘𝚛𝚎_𝚏𝚊𝚒𝚕𝚞𝚛𝚎𝚜 to a Snowflake stage and attach a link directly to dbt run results. You can learn more about the thought process and details here at the link in the comments. But of course you can also use this tool for free too – our setup is not unique, so we figured that if it’s useful for us, then it’ll hopefully be useful for other folks. Would love to hear your feedback, especially if you’ve customized dbt alerts yourself! #dataengineering #dbt #analytics
To view or add a comment, sign in
-
-
Streamlining Data Transformation with dbt and VSCode Extensions In the realm of data transformation, efficiency and precision are paramount. Fortunately, we have at our disposal powerful tools like dbt and VSCode extensions that empower us to streamline and enhance our workflows. The image provided showcases a glimpse into my development environment, where I leverage the synergy of dbt and VSCode extensions to transform data with remarkable efficiency. Key Tools in Action: dbt: This open-source data build tool serves as the foundation for my data transformation process. It enables me to write SQL-based transformations, which are then compiled into modular data models. VSCode Extensions: My VSCode setup is enriched with a suite of extensions that streamline and enhance my dbt development experience. These extensions include: dbt Core Power User: This extension provides essential features for dbt development, such as code completion, linting, and debugging. vscode-dbt: This extension offers syntax highlighting, snippets, and other enhancements for dbt code. dbt-shortcuts: This extension provides shortcuts and commands to expedite dbt tasks within VSCode. Turntable for dbt Core™: This extension enables interactive exploration of data models and query results. dbt Osmosis Power User: This extension facilitates seamless integration between dbt and other tools, such as Git and documentation generators. vscode-dbt-language: This extension enhances the dbt coding experience with language support features. dbt-bigquery-preview [UNMAINTAINED]: This extension provides a preview of dbt queries in BigQuery directly within VSCode. JetBrains Icons Enhanced with DBT: This extension enhances the appearance of dbt-related icons within VSCode. Impact on Data Transformation: The combination of dbt and VSCode extensions has revolutionized my data transformation workflow, leading to significant improvements in efficiency, accuracy, and maintainability. Enhanced Efficiency: The extensions streamline repetitive tasks, automate code generation, and provide interactive exploration capabilities, saving me valuable time and effort. Improved Accuracy: The linting and debugging features ensure that my code is error-free, producing reliable and trustworthy data transformations. Boosted Maintainability: The modularity and documentation capabilities of dbt, coupled with the organization and code completion features of the extensions, make my codebase more maintainable and easier to collaborate on. #DataTransformation #dbt #VSCodeExtensions #DataWorkflow #DataEfficiency #DataAccuracy #DataMaintainability #DataScience #DataEngineering #DataAnalytics
To view or add a comment, sign in
-
-
Today I’m really excited to announce one of our biggest product updates yet.. Orchestra supports dbt Core™ Orchestra now supports running your dbt core code. We’ve spoken to hundreds of data teams who get that running your transformations from your control plane or orchestrator makes sense - after all, they are just push-down SQL queries. Another problem we’ve seen many data teams implementing self-serve analytics struggle to grapple with is sprawl and guardrails due to decentralisation 🥵 This is because there is not a coherent metadata or Data Product framework - analysts “get given” dbt, and chaos ensues! By leveraging Orchestra with dbt Data teams can finally add a Governance layer to Data Products that are important and should be prioritised, versus face a never ending wall of failing models and tests that are impossible to manage. Finally - pricing. Yes pricing, one of our prospects said earlier “We do not need a development environment for dbt - we need a deployment environment and pricing should reflect that”. This is our value proposition to you - as an early adopter of dbt myself many 🌚 ago, I have always used a local code editor and local CLI, however how to run and monitor dbt was always a total pain. Orchestra is the easiest way to get your dbt code up and running in production. Pricing is per minute your dbt projects run. This scales as you run more. That’s all for now! Data community of Linked in please check us out! I know the vast majority of you have a dbt project somewhere so would love to get your thoughts and feedback. Read the announcement below 📣 Hugo #dataengineering #dbt #analyticsengineering #analytics
To view or add a comment, sign in
-
-
Last week, I completed a intro course about dbt Labs. I summarized the key points of this intro in a concept map. Check it out! In the world of data, efficiency, collaboration, and accuracy are paramount. Enter dbt Labs, a revolutionary tool that's reshaping data workflows for professionals everywhere. Here’s a quick dive into why DBT is a must-have for your data toolkit: 1. Streamlined Data Transformation DBT simplifies the transformation process with modular SQL code, reducing complexity and boosting efficiency. Spend less time debugging and more time deriving insights. 2. Enhanced Collaboration Integrating seamlessly with version control systems like Git, DBT enables smooth teamwork. Multiple team members can work simultaneously, track changes, and maintain transparency. 3. Clear Data Lineage and Documentation DBT automatically generates comprehensive documentation and data lineage graphs, providing clarity on data flows and transformations, ensuring data integrity and ease of troubleshooting. 4. Rigorous Testing for Quality Assurance Implement tests alongside your data models to catch issues early, ensuring high-quality and reliable data for your business decisions. 5. Scalability and Performance DBT scales with your data needs, leveraging modern cloud data warehouses to maintain performance, regardless of data size. 6. Modularity and Reusability Promote consistency and speed up development with reusable data models. DBT’s modular approach ensures standardization across projects and teams. 7. Empowering Analytics Engineering Bridge the gap between data engineering and analysis. DBT empowers analysts to handle data transformations, making your team more agile and responsive. Elevate Your Data Game dbt Labs is not just a tool—it's a catalyst for transforming your data operations. Ready to unlock the full potential of your data? Dive into DBT and watch your data workflows soar! 🚀 #dataengineering #moderndatastack #moderndataarchitecting #dbtlabs #dataarchitect
To view or add a comment, sign in
-
-
4 𝙔𝙚𝙖𝙧𝙨 𝙞𝙣 𝘿𝙖𝙩𝙖 𝙀𝙣𝙜𝙞𝙣𝙚𝙚𝙧𝙞𝙣𝙜: 𝙈𝙮 𝙀𝙭𝙥𝙚𝙧𝙞𝙚𝙣𝙘𝙚 𝙬𝙞𝙩𝙝 𝘿𝘽𝙏 (𝘿𝙖𝙩𝙖 𝘽𝙪𝙞𝙡𝙙 𝙏𝙤𝙤𝙡) After 4 years as a Data Engineer, I've come to rely on a variety of tools to help me solve complex data problems. One tool that has truly stood out is DBT (Data Build Tool), which has transformed the way we transform, model, and manage data in modern data pipelines. For those unfamiliar, DBT is an open-source tool that allows data engineers and analysts to transform raw data into a usable format using SQL. It’s a game-changer when it comes to building clean, reliable, and scalable data models. Here’s why I’ve found DBT so valuable in my work: SQL-Centric Approach: DBT allows data engineers to write SQL queries to perform complex data transformations. There’s no need to rely on custom scripts—just SQL and a solid framework. This makes it easier for analysts and engineers to collaborate seamlessly. Version Control: With DBT, you can apply software engineering best practices to data transformations, including version control (Git), testing, and modularity. This leads to more maintainable, error-free code, and faster collaboration between teams. Modularity & Reusability: DBT encourages you to break down your transformations into modular pieces, making the code easier to understand and reuse across projects. This means faster iterations and more flexible data workflows. Data Lineage & Documentation: One of DBT's greatest strengths is its ability to generate documentation and data lineage automatically. You can easily track how data flows through your models, providing visibility and ensuring that stakeholders understand where the data comes from and how it’s transformed. DBT isn’t just a tool—it’s a framework that has helped me streamline data processes, improve collaboration with data analysts, and build trust in data models. It’s essential for any team looking to create scalable, maintainable, and transparent data transformations. I’m excited to continue learning and growing as DBT evolves. If you’re in data engineering or analytics, and haven’t explored DBT yet, I highly recommend giving it a try! #DataEngineering #DBT #DataBuildTool #DataTransformation #SQL #Analytics #TechInnovation #DataLineage
To view or add a comment, sign in
-
-
I love it when someone puts words around a concept or framework that hasn't been clearly articulated thus far. Naming something explicitly helps sharpen the way we work and the conversations we have with each other. The Analytics Development Lifecycle (ADLC) is, for me, one of those concepts, although it borrows explicitly from software engineering. dbt in its original incarnation applied software engineering best practices to a very specific type of work: transforming data. dbt in its current (and future!) incarnations caters to the ALDC much more holistically. Regardless of the specific data stack you might be working with though, the ADLC implies certain standards-- in terms of both (i) what we should aim for organisationally and (ii) what we should expect from our data tooling.
The Analytics Development Lifecycle (ADLC) | dbt Labs
getdbt.com
To view or add a comment, sign in
-
Over the past few months, I've been diving deep into dbt and discovering how software engineering best practices like unit testing, model contracts, and data quality tests are revolutionizing the way we handle data. If you're looking to build more reliable and scalable data pipelines, this is a must-read. Check out my latest article on how dbt is transforming data engineering! #dbt #DataQuality #UnitTesting #ModelContracts #DataTransformation #dbtlabs
Where is the Data World Going?
medium.com
To view or add a comment, sign in
-
🌟 The Rise of dbt: A Game-Changer for Data Engineering 🌟 Speaking with lot's of data leaders and a interesting pattern has developed, a lot of companies and data teams are integrating DBT into their tech stack, so let's deep dive into how this can be a game changer for data enginering. SQL-First Approach: 🧑💻 dbt lets you transform data using SQL, which means that data analysts and engineers alike can jump in and contribute. No more silos – everyone can play a part in building and refining data models. Modular and Reusable Code: 🧩 With dbt, you can create small, reusable SQL pieces. This not only speeds up development but also makes maintaining and scaling your data pipelines a breeze. Built-in Version Control: 🔄 By integrating smoothly with Git, dbt ensures every change is tracked and reviewed. This means better collaboration and more robust data governance. Automated Data Testing: ✅ Ensuring data quality is key, and dbt’s automated testing helps catch issues early. This means more reliable data and fewer surprises down the line. Comprehensive Documentation: 📚 dbt automatically generates detailed documentation for your data models. This makes it easier for everyone on the team to understand and work with the data. Thriving Community: 🌐 One of the best things about dbt is its vibrant community. There's a wealth of plugins, integrations, and shared knowledge, which means you’re never alone in solving a problem. Why This Matters 🤔: For data teams, dbt represents a shift towards more agile, efficient, and collaborative ways of working. It helps us build robust data pipelines faster and ensures the data we rely on is accurate and well-documented. As we continue to navigate the complexities of the data landscape, tools like dbt are invaluable. They empower us to meet the growing demands for timely, reliable, and scalable data solutions. I’m genuinely excited about the potential dbt brings to our field and can’t wait to see how it continues to evolve. If you’re using dbt or thinking about it, I’d love to hear your thoughts and experiences! #data #dataengineering #dbt #datatransformation #gamechanger
To view or add a comment, sign in
-
-
I'd say the most anticipated feature of dbt 1.8 is Unit Tests; however, my favorite feature just released is the ability to "dry run", or run a model with zero rows processed using the `--empty` flag. This is fantastic for light weight CI/CD pipelines that don't add unnecessary load to your data warehouse. Here's a great article detailing all the features of 1.8. Which feature is your favorite? https://lnkd.in/eHbxgSCm
dbt 1.8 it is just wow
medium.astrafy.io
To view or add a comment, sign in