💊 DATA Pill #118 - Real-time Streaming, Data Lakehouse, Flink & Kubernetes

💊 DATA Pill #118 - Real-time Streaming, Data Lakehouse, Flink & Kubernetes


Hi,

 Today, streaming in the lead role.

Other stars of the program: Flink, Databricks, Kubernetes, Airflow and Data Lakehouse.

Let’s start!



ARTICLES

AWS Lambda vs. Cloudflare Workers Detailed Comparison | 7 min | Data Engineering | Kiryl Anoshka | Fively Blog

This article compares AWS Lambda and Cloudflare Workers, focusing on their theoretical capabilities and practical differences across key categories such as performance, runtime, and pricing. It also includes insights on which platform excels and a cold start comparison to highlight their distinctions, particularly for smaller tasks.

Flink® on Kubernetes | 15 min | Streaming | Ran Zhang | Airbnb Tech Blog

Evolution of Flink architecture at Airbnb and comparison with their prior Hadoop Yarn platform with the current Kubernetes-based architecture.


In MORE LINKS: Machine Learning in Content Moderation at Etsy and Transforming Sports Data with Databricks

{ MORE LINKS }



TUTORIALS

How we built RudderStack’s real-time personalization engine | 9 min | Real-time personalization | Mackenzie Hastings, Matt Kelliher-Gibson, Chandler Van De Water, Eric Dodds | Rudderstack Blog

Creating real-time personalized website and app experiences. From identity resolution to tracking success, this tutorial will walk you through how to build a dynamic, user-focused experience that drives engagement and conversions.


Making WAF ML models go brrr: saving decades of processing time | 23 min | ML | Alex Bocharov | The Cloudflare Blog

This one covers the performance optimizations for our WAF ML product, showcasing code examples, benchmarks, and the impressive latency reductions achieved.

In MORE LINKS you will read about:

  • Flink with metadata catalog
  • Crazy Challenge: Run Llama 405B on a 8GB VRAM GPU

{ MORE LINKS }



DATA LIBRARY 

Accelerate ETL, data warehousing, BI and AI | ebook | databricks

  • Building applications with traditional AI and generative AI
  • Databricks Data Intelligence Platform



DATA TUBE

Realtime Streaming with Data Lakehouse - End to End Data Engineering Project | 1h | Streaming | CodeWithYu

How to design, implement and maintain secure, scalable and cost effective lakehouse architectures leveraging Apache Spark, Apache Kafka, Apache Flink, Delta Lake, AWS, and open-source tools.



CONFS EVENTS AND MEETUPS

Airflow Summit 2024 | San Francisco | 10-12 September

This conference does not need to be introduced. In agenda:

  • Mastering LLM Batch Pipelines: Handling Rate Limits, Asynchronous APIs, and Cloud Scalability
  • OpenLineage: From Operators to Hooks by Maciej Obuchowski - our community member 👏
  • How we use Airflow at Booking to orchestrate Big Data workflows


________________________

Have any interesting content to share in the DATA Pill newsletter?

➡ Join us on GitHub

➡ Dig previous editions of DataPill 


Adam from the GetInData | Part of Xebia

Grzegorz Rycaj

Chief Operations Officer at Billennium

4mo

„It got stuck in my outbox” 😀

To view or add a comment, sign in

More articles by Adam Kawa

Insights from the community

Others also viewed

Explore topics