DATA Pill #131 - Embeddings are underrated, The advent of the Open Data Lake

DATA Pill #131 - Embeddings are underrated, The advent of the Open Data Lake

Hi,

This week’s DATA Pill brings you the latest on data architecture upgrades, dynamic BI solutions, and Kafka’s release. Check out articles on embeddings for tech docs, Netflix’s partner management overhaul, Demandbase’s switch from ClickHouse, and much more.

ARTICLES

The advent of the Open Data Lake | 7 min | Data Engineering | Julien Le Dem | The Symphatetic Ink Blog

Julien Le Dem maps out the shift from Hadoop to Open Data Lake, showing how cloud-native architecture eliminates data silos and enhances scalability.

Demandbase Ditches Denormalization By Switching off ClickHouse | 4 min | Data Engineering | StarRocks Engineering

Demandbase moved from ClickHouse to CelerData Cloud, cutting storage costs and simplifying data pipelines to handle real-time updates at scale.

TUTORIALS

Embeddings are underrated | 6 min | ML | Kayce Basques | Technical Writing Blog

Embeddings bring new power to technical docs, enabling content connections without complex models. Learn how these vectors organize data at a massive scale.

In MORE LINKS you will read:

  • Streamlining Contract Management in Revenue Infrastructure
  • Rethinking Data Layers: When Medallion Architecture Isn’t Enough
  • BI-as-Code and the New Era of GenBI

{ MORE LINKS }

NEWS

Introducing Apache Kafka® 3.9 | 5 min | Data Streaming | Confluence Blog

Kafka 3.9 wraps up the 3.x series with flexible KRaft quorum management, streamlined ZooKeeper migration, and production-ready tiered storage.

TOOL

IdentityRAG 

IdentityRAG combines identity resolution with retrieval-augmented generation to provide accurate, unified views of customer data, which is ideal for comprehensive LLM responses.

PODCAST

An Opinionated Look At End-to-end Code Only Analytical Workflows | 56 min | Data Analytics | Tobias Macey, Burak Karakan | Data Engineering Podcast

Burak Karakan explains the benefits of fully code-driven analytics workflows, making integrations faster and more cohesive across the data stack.

CONFS, EVENTS AND MEETUPS

Big Data Technology Warsaw 2025 - CFP | 24th November

The Big Data Technology Warsaw Summit returns on April 9-10, 2025! Submit your speaking proposal and join over 500 professionals as they dive into the latest in data engineering and big data technology.

_______________________

Have any interesting content to share in the DATA Pill newsletter?

➡ Join us on GitHub

➡ Dig previous editions of DataPill

Adam from the GetInData | Part of Xebia

To view or add a comment, sign in

More articles by Adam Kawa

Explore topics