iterative.ai’s Post

View organization page for iterative.ai, graphic

7,535 followers

1mo

A quick glimpse from our CEO, Dmitry Petrov, into ETL and data governance aspects of the DataChain and our SaaS for unstructured data processing: ✅ Each dataset is immutable, versioned, and has fingerprints for all data objects to reproduce; ✅ All dependencies are tracked and saved: code, datasets, raw data sources; ✅ ETL can be run automatically or on schedule to produce new versions of the datasets; Interested to learn more? Contact us here https://datachain.ai/ Open source version is available here to try: https://lnkd.in/emFvJD84 #unstructured #dvc #datachain #machinelearning

1 Comment

Transcript

Deze Chinese poort reproduce bitje en data lineage. Out-of-the-box. Uh, to reproduce a data set, you need to find the version you want to reproduce. Click to the version and see the exact code which was used to create this data set. So this code can be is reproducible. In order to see that the data lineage clicked to dependency and you will see the whole dependency graph between the data set. And all the data sources down to storage, in this case Google file.

To view or add a comment, sign in

More Relevant Posts

Weblink Technology - Elasticsearch Experts

213 followers
1mo
Report this post
🚀 Say hello to smarter data management with Index Lifecycle Management (ILM) in Elasticsearch! 💡 ILM automates the complex process of managing your indices through various stages—hot, warm, cold, and delete—tailored to your specific needs. 🌐 This game-changer not only boosts performance and ensures compliance but also optimizes storage resources, cutting down on costs. 📊 Ready to revolutionize your data management? #Elasticsearch #DataManagement #TechInnovation #ILM #StorageOptimization #BigData #MachineLearning #CloudComputing #DataScience
Like Comment
To view or add a comment, sign in
Blue Orange Digital

12,615 followers
8mo
Report this post
Revolutionize Your Data Workflow with Delta Live Tables! 🚀 Are you ready to streamline your ETL processes? Delta Live Tables (DLT) is a game-changer in the world of data transformation. As a declarative ETL framework from Databricks, DLT simplifies both streaming and batch ETL, making it cost-effective and efficient. Key Benefits:🔍 ☑ Automated Task Orchestration and Monitoring: Set your transformations, and DLT handles the rest—orchestration, cluster management, and error handling are all automated. ☑ Enhanced Data Quality: With built-in data quality and error handling, you can trust your data processes. ☑ Simplified Infrastructure Management: Automatically handles relationships among datasets and adjusts production infrastructure to ensure timely and accurate data delivery. 👨💻 Demo Included! The article walks through a practical demo using Parquet files, from setting up compute clusters and SQL warehouses to creating Delta Live Tables and configuring pipelines. This hands-on approach helps you see the potential of DLT in action! 🔗 https://lnkd.in/eZs64bka Dive into the full article by Andrés Zoppelletto to explore how Delta Live Tables can transform your data operations. Let's harness the power of advanced ETL together and make data handling seamless! #DataTransformation #ETL #Databricks #DeltaLiveTables #DataIntelligence #TechInnovation
Like Comment
To view or add a comment, sign in
Datalere

324 followers
6mo
Report this post
Datalere brings unparalleled expertise to Databricks, empowering your organization with advanced data governance, warehousing, ETL, and orchestration capabilities. Our seasoned team ensures you get the most out of Databricks, optimizing your data infrastructure for superior performance and scalability. Learn how we can help: https://bit.ly/49rbMeg #Databricks #DataPlatform #DataOptimization #DataManagement
Like Comment
To view or add a comment, sign in
Lightup Data

2,856 followers
5mo
Report this post
❓Struggling with #SelfServiceAnalytics due to data silos, bad data, and resource-intensive ETL processes? That’s why we partnered with Dremio, the only Open Unified Lakehouse Platform, to help solve #DataQuality issues for self-service analytics. 💡➕🐬 Lightup connects to Dremio, enabling #DataAnalysts and #engineers to access, run checks, and analyze data across sources — regardless of where it’s stored in the cloud or on-premises, including #NoSQL databases and #CSV files. The best part? No data movement or replication required, thanks to our pushdown architecture. Learn more about how Lightup and Dremio work #BetterTogether, providing reliable data for analytics. 🔗 Link in comments. #DataQuality #Lightup #Partnership #Integration #DataQualityforDremio
1 Comment
Like Comment
To view or add a comment, sign in
Faiz Elahi

Co-Founder at PrincetonDataCompany | Data Engineer | Manager | Analytics Engineer | Data Warehousing | Data Integration
2mo
Report this post
Typical data flow with #dbt that helps you transform your data from #BigQuery, #Snowflake, #Databricks, and #Redshift, among others (see the dbt documentation for supported data platforms)
Like Comment
To view or add a comment, sign in
WhereScape Data Automation

11,875 followers
6mo
Report this post
Ready to take your data analytics to the next level? Amazon Redshift can analyze huge volumes of data at a near-limitless scale, but leveraging its immense processing power requires equally large considerations for consistently loading data into the platform. That's where WhereScape comes in. Automate your data migration to Redshift and streamline ETL processes with our built-in best practices. Your data is growing in volume and complexity faster than ever, and we're here to help you scale beyond your wildest expectations. #automation #bigdata #WhereScape #dataautomation 🔍 Want to see it in action? Book a demo now! https://buff.ly/4ccfauQ
Like Comment
To view or add a comment, sign in
Andreas Kretz Andreas Kretz is an Influencer

I teach Data Engineering & match talents with dream jobs | 10+ years of experience | 3x LinkedIn Top Voice | 200k YouTube subscribers
5mo
Report this post
📣 In case you missed it: find out about the simplest way to implement data observability! In my latest chat with Ryan Yackel, CMO at Databand.ai, and Eric Jones, Senior Solution Architect at IBM, I got introduced to Databand’s new public API. Watch the recording now: https://bit.ly/3Y0rudr Learn about: ✅ Modern & Legacy Data Observability Support ✅ Highlights for Control-M, ADF, & Others ✅ Various Use Cases With the New Databand Public API, Like Incident Management and Data Governance #dataengineer #dataengineering #datascience #api #openapi #dataobservability
1 Comment
Like Comment
To view or add a comment, sign in
Viktor Kessler

Unify Lakehouse technology and Data Mesh principles. Build real governance based on actionable data contracts
1w
Report this post
🚀 Towards Actionable Metadata What if metadata could do more than describe your data? What if it could act—trigger workflows, prevent pipeline failures, and keep your data ecosystem consistent and secure? In my latest blog, I dive into: - The evolution of metadata from static documentation to actionable. - How Apache Iceberg and #REST #Catalogs are redefining metadata. 💡 Ready to rethink metadata as a driver of automation and transparency? Read the full blog here: https://lnkd.in/erajZkm9 I’d love to hear your thoughts—how is your team making metadata actionable? Let’s start the conversation! 👇 #Lakehouse #DataMesh #DataContracts #ApacheIceberg #Lakekeeper #Metadata

Towards Actionable Metadata

medium.com
Like Comment
To view or add a comment, sign in

iterative.ai

7,535 followers

View Profile Follow

More from this author

Explore topics