DATA Pill #049 - 91% of ML Models degrade in time, MLflow 2.3 and Secrets of Deep Reinforcement Learning

DATA Pill #049 - 91% of ML Models degrade in time, MLflow 2.3 and Secrets of Deep Reinforcement Learning

Hi,


After this week, it is necessary to say:

ML and AI are doing well and are still among the year's hottest topics.

So, dig into the newest dose of knowledge.


ARTICLES

91% of ML Models Degrade in Time | 10 min | ML | Santiago Víquez | nannyML

The study by Vela et al. showed that the ML model's performance doesn't remain static, even when they achieve high accuracy during deployment. And that different ML models age at different rates even when trained on the same datasets. Another relevant remark is that not all temporal drifts will cause performance degradation. Therefore, the choice of the model and its stability also becomes one of the most critical factors in dealing with performance temporal degradation.


Use dbt and Duckdb instead of Spark in data pipelines | 7 min | Data Engineering | Niels Claeys | datamindedbe Blog

Niels presents several reasons to consider using dbt and Duckdb instead of Spark. He also highlights some limitations and challenges of using DBT and DuckDB.

The article provides a comprehensive overview of DBT and DuckDB and how they can be used in data pipelines. It encourages readers to explore these tools as alternatives to Spark.

No alt text provided for this image


Fivetran Puts the Customer Last | 10 min | Data Engineering | Lauren Balik | Personal Blog

Lauren strikes back. This time some conspiracy theory about Modern Data Stack vendors and what the long-awaited Fivetran's S3 connector has to do with that. As usual, it may be a provocative narration style, but it is still good food for thought. If you start looking at your cloud spend, human capital, and products as a portfolio of investments that generate returns, you will develop habits that lead you away from these Modern Data Stack games.


The road to running Apache Flink applications on AWS KDA | 6 min | Cloud | Duc Anh Khu | Deliveroo Engineering blog

In this article, you will read about the road to running Apache Flink applications on AWS KDA. Why did the Deliveroo team choose AWS KDA, and what lessons they’ve learned? Dive into the text and let yourself know their plan for the future.

No alt text provided for this image



In MORE LINKS you will read about How Databricks Performed ETL on One Billion Records For Under $1 and how to save 80% of GCP costs.

{ MORE LINKS }



DATA LIBRARY

Artificial Intelligence Index Report 2023 | takes time to dig in | AI | Stanford University Human-Centered Artificial Intelligence

The sixth edition of the AI Index Report is here, featuring more original data than any previous version. Few takeaways for you:

  • Industry races ahead of academia.
  • The world’s best new scientist… AI?
  • AI is both helping and harming the environment.
  • The number of incidents concerning the misuse of AI is rapidly rising.



TUTORIAL

Managing Multiple BigQuery Projects With One dbt Cloud Project | 9 min | GCP | Lucas Ortiz | Xebia Blog

This one provides a step-by-step guide to set up a BigQuery connection in the dbt Cloud project, how to enable BigQuery API, and how to create a service account for the project. It concludes by providing a workflow to manage and execute dbt projects on multiple big projects in dbt Cloud.


In MORE LINKS you will read about introducing MLflow 2.3: Enhanced with Native LLM Support and New Features.

{ MORE LINKS }



DATA ODDITIES

You Can Try Auto-GPT, the Next Generation of ChatGPT, Right Now | 4 min | AI | Jake Peterson | Lifehacker

Auto-GPT is a complex system relying on multiple components. It connects to the internet to retrieve specific information and data (something ChatGPT’s free version cannot do), features long-term and short-term memory management, uses GPT-4 for OpenAI’s most advanced text generation, and GPT-3.5 for file storage and summarization.



NEWS

Releasing Ververica Cloud - A Fully Managed Cloud Native Service | 3 min | Cloud | Vladimir Jandreski | Ververica Blog

Ververica has announced the beta release of Ververica Cloud. It is a fully-managed service for deploying, operating, and monitoring Apache Flink applications, including stream processing and real-time analytics. Ververica Cloud offers several benefits, including:

  • Simplified deployment and management of Apache Flink clusters 
  • Efficient resource utilization and automatic scaling 
  • Integration with popular data sources and sinks 
  • Powerful monitoring and alerting capabilities 


In MORE LINKS news from AWS and Databricks

{ MORE LINKS }




PODCAST

Data and analytics for an audience engagement platform | 45 min | host: Adam Kawa guest: Ludwig Holmstrom | Radio DaTa Podcast

Ludwig works as a Product Analytics Director at Mentimeter. Before joining Mentimeter, he worked with data & analytics for over a decade at various companies such as Kry, Spotify, and Google.

Discussed subjects:

  • What is an audience engagement platform 
  • Analytics use-cases at Mentimeter e.g. real-time visualization, customer journey
  • Autonomous teams at Mentimeter
  • Analytics stack at Mentimeter e.g. AWS, Redshift, LookerKPIs and dashboards e.g. Pirate Metrics (AARRR), Viral loop, LTV (Customer lifetime value) 
  • Unique aspects of working with data at Mentimeter


Secrets of Deep Reinforcement Learning | 2 h 47 min | host: Tim Scarfe guest: Minqi Jiang | Machine Learning Street Talk

Dr. Tim Scarfe interviews Minqi Jiang, on the impact of deep reinforcement learning on technology, startups, and research. Minqi shares his experiences in balancing serendipity and planning, explains the role of objectives and Goodhart's Law in decision-making, and discusses the differences between RL and supervised learning. 

They also explore the possibilities of open-endedness and the intelligence explosion, as well as limitations of RL and interpretability concerns with software 2.0.




CONFS EVENTS AND MEETUPS

Snowflake Summit 2023 | 26-29th June | Las Vegas

Attend Snowflake Summit 2023 to learn how to access, build, and monetize data, tools, models, and applications in ways that were previously unimaginable. Enable seamless alignment and collaboration across these crucial functions in the Data Cloud to transform nearly every aspect of your organization. 

At the Summit, you’ll hear all about the latest innovations coming to the Data Cloud, and learn from hundreds of technical, data, and business experts about what’s possible for you and your organization in a world of data collaboration.

________________________


Have any interesting content to share in the DATA Pill newsletter?

➡ Join us on GitHub

➡ Dig previous editions of DataPill 



Adam from the GetInData | Part of Xebia

Shannon Barrow

I help Data Analytics teams get value from their data faster, cheaper, and reliably

1y

That is a lot of thought leadership in 1 post. Ty for curating!

Santiago Viquez

Data Science @ NannyML. Writing "The Little Book of ML Metrics".

1y

Thanks for the spotlight!

To view or add a comment, sign in

More articles by Adam Kawa

Insights from the community

Others also viewed

Explore topics