DATA Pill #014 - Future-Aware Data Engineering & Post-Deployment Data Science
Hi everyone 👋,
Today we have one clickbait,
one “put the cat amongst the pigeons” kinda article 🐦
one podcast that has the potential to go viraland more.
Let’s take a look;)
ARTICLES
Keeping track of shipments minute by minute: How Mercado Libre uses real-time analytics for on-time delivery | 12 min read | Data Analytics | Pablo Fernández Osorio | Mercado Libre | Google Cloud Blog
Mercado shares a continuous intelligence framework that enables them to deliver 79% of our shipments in less than 48 hours (due to increased demand).
Data used to support decision-making in key processes:
Airflow's Problem | 7 min read | Airflow | Stephen Bailey | Data People Etc.
Let’s put the cat amongst the pigeons ;) Why the author doesn’t like Airflow and disputes the data mesh times we should seek as an alternative.
Google Introduces Zero-ETL Approach to Analytics on Bigtable Data Using BigQuery | 7 min read | Cloud | Steef-Jan Wiggers | InfoQ Blog
Previously, customers had to use ETL tools such as Dataflow or self-developed Python tools to copy data from Bigtable into BigQuery; however, now they can query data directly with BigQuery SQL.
NEWS
Python models | 10 min read | Databricks Blog
Update on the future feature of dbt, python models.
A dbt Python model is a function that reads in dbt sources or models, applies a series of transformations and returns a transformed dataset. DataFrame operations define the starting points, the end state and each step along the way. This is similar to the role of CTEs in dbt SQL models.
TUTORIALS
Iceberg Tables: Powering Open Standards with Snowflake Innovations | 7 min read | Data Lake | James Malone | Snowflake
Snowflake is used to solve three challenges commonly related to large data sets: control, cost, and interoperability. Iceberg Tables combine unique Snowflake capabilities with the Apache Iceberg and Apache Parquet open source projects to solve this. This article explains how Iceberg Tables are supposed to help with that.
Recommended by LinkedIn
PODCAST
Future-Aware Data Engineer | 42 min | Data Engineering | 💪 Paweł Leszczyński | GetInData
Will this go viral? It’s already widely commented and shared material. …
It is the story of past and current inventions like Facebook by Mark Zuckerberg vs the airplane by the Wright brothers. What is the Dunning-Krueger effect and what does it have in common with Wikipedia? Why did Jacek Kuroń not have to pay his phone bills? We're going to look at these inventions through the lens of Yuval Noah Harari, Daniel Kahneman, and Slavoj Zizek. Seems like the perfect authors' trio for the ideal data-related holiday podcast.
Post-Deployment Data Science | 33 min | ML | Hakim Elakhrass | DataCamp
Many machine learning practitioners dedicate most of their attention to creating and deploying models that solve business problems. However, what happens post-deployment? Moreover, how should data teams go about monitoring models in production?
Takeaway: Data scientists need to cultivate a thorough understanding of a model’s potential business impacts, as well as the technical metrics of the model.
DataTube
WHOOPS, THE NUMBERS ARE WRONG! SCALING DATA QUALITY NETFLIX | 0,5 h | Michelle Ufford | Netflix | DataWorks Summit
We just found out that there exists a named development pattern of data pipeline DAGs that concern data quality called “Write-Audit-Publish”.
It’s like “blue-green deployment but for data”. I know, it’s obvious, but hey, it’s good to have names for simple things ;)
The original name shows up in this Netflix presentation.
You’re probably curious about how people apply this pattern in tools like dbt.
We only found one video and some slides - you will find them by clicking on MORE LINKS button ⬇
If you know of some interesting sources on this subject, please leave a comment ;)
CONFS AND MEETUPS
How to simplify data and AI governance | 16 August | Online | databricks & Milliman
Speakers: Paul Roome, Liran Bareket, Dan McCurley
—---
That’s it for today! Please don't hesitate to forward this on.
See you next week 👋
Adam Kawa from GetInData
Thanks for sharing! 😁