💊 DATA Pill #118 - Real-time Streaming, Data Lakehouse, Flink & Kubernetes
Hi,
Today, streaming in the lead role.
Other stars of the program: Flink, Databricks, Kubernetes, Airflow and Data Lakehouse.
Let’s start!
ARTICLES
AWS Lambda vs. Cloudflare Workers Detailed Comparison | 7 min | Data Engineering | Kiryl Anoshka | Fively Blog
This article compares AWS Lambda and Cloudflare Workers, focusing on their theoretical capabilities and practical differences across key categories such as performance, runtime, and pricing. It also includes insights on which platform excels and a cold start comparison to highlight their distinctions, particularly for smaller tasks.
Flink® on Kubernetes | 15 min | Streaming | Ran Zhang | Airbnb Tech Blog
Evolution of Flink architecture at Airbnb and comparison with their prior Hadoop Yarn platform with the current Kubernetes-based architecture.
In MORE LINKS: Machine Learning in Content Moderation at Etsy and Transforming Sports Data with Databricks
TUTORIALS
How we built RudderStack’s real-time personalization engine | 9 min | Real-time personalization | Mackenzie Hastings, Matt Kelliher-Gibson, Chandler Van De Water, Eric Dodds | Rudderstack Blog
Creating real-time personalized website and app experiences. From identity resolution to tracking success, this tutorial will walk you through how to build a dynamic, user-focused experience that drives engagement and conversions.
Making WAF ML models go brrr: saving decades of processing time | 23 min | ML | Alex Bocharov | The Cloudflare Blog
This one covers the performance optimizations for our WAF ML product, showcasing code examples, benchmarks, and the impressive latency reductions achieved.
Recommended by LinkedIn
In MORE LINKS you will read about:
DATA LIBRARY
Accelerate ETL, data warehousing, BI and AI | ebook | databricks
DATA TUBE
Realtime Streaming with Data Lakehouse - End to End Data Engineering Project | 1h | Streaming | CodeWithYu
How to design, implement and maintain secure, scalable and cost effective lakehouse architectures leveraging Apache Spark, Apache Kafka, Apache Flink, Delta Lake, AWS, and open-source tools.
CONFS EVENTS AND MEETUPS
Airflow Summit 2024 | San Francisco | 10-12 September
This conference does not need to be introduced. In agenda:
________________________
Have any interesting content to share in the DATA Pill newsletter?
➡ Join us on GitHub
➡ Dig previous editions of DataPill
Adam from the GetInData | Part of Xebia
Chief Operations Officer at Billennium
4mo„It got stuck in my outbox” 😀