DATA Pill #056 - Fine Tuning vs. Prompt Engineering LLM, Kedro-Snowflake plugin, and more…

Adam Kawa

CEO at GetInData, ex-Spotify | Data & AI for banks, telecoms, retail & more.

Published Jun 12, 2023

Hi,

Your ultimate source for all things data has landed!

In this power-packed edition, I've curated a treasure trove of insights and innovations to fuel your data-driven journey.

Kedro-Snowflake plugin, LLM, unlocking the power of JunoDB, and much more is waiting for you to enjoy.

ARTICLES

From Image Classification to Multitask Modeling: Building Etsy’s Search by Image Feature | 7 min | Data Engineering | Eden Dolev, Alaa Awad | Etsy Blog

This image-based discovery tool on Etsy’s mobile apps is available now. Read the story on how the Etsy team was able to take a proof-of-concept hackathon project, and turn it into a production feature to help make the millions of unique and special items on Etsy more discoverable for buyers.

Dependency Management at Scale | 5 min | Data Engineering | Adrian Comisel | Yelp Engineering Blog

Keeping project dependencies up to date is crucial, but there is a Yokyo Drift. It actively scans all repositories in use at Yelp and submits pull requests that upgrade any outdated dependencies, and tracks and monitors the progress of these upgrades. Let’s take a quick look at this solution.

Imagine you want to develop a personalized language model (LLM) powered assistant for generating financial report summaries whilst ensuring the utmost privacy for your organization, but also guaranteeing the utmost privacy for your organization is a challenge.

In his latest blog post, Michal delves into three essential aspects:

The obstacles that must be surmounted to achieve this goal.
The approach you can adopt to construct your very own LLM-based assistant.
Detailed instructions on how to implement this solution on the Google Cloud Platform.

Fine Tuning vs. Prompt Engineering Large Language Models | 9 min | LLM | Niels Bantilan | MLOps Community Blog

Let's dive into this blog post, where Niels describes prompt engineering and fine-tuning in more detail, gives a practical sense of how they are different, and provides you with a few heuristics that will help you begin your fine-tuning journey.

In MORE LINKS you will read about unlocking the Power of JunoDB, and why Modern Data Platforms don’t do ETL anymore

{ MORE LINKS }

TUTORIAL

Marcin, Marek and Michał unveil their newest Kedro-Snowflake plugin. Thanks to this, you can streamline your ML pipelines in Kedro and effortlessly execute them in a scalable Snowflake environment, and all it takes is three simple steps.

NEWS

Announcing NVIDIA DGX GH200: The First 100 Terabyte GPU Memory System | 4 min | AI | Pradyumna Desale | Nvidia Developer

During COMPUTEX 2023, NVIDIA made an exciting revelation by introducing the NVIDIA DGX GH200. This groundbreaking development in GPU-accelerated computing is set to revolutionize handling massive AI workloads. Apart from highlighting the critical elements of the NVIDIA DGX GH200's architecture, this announcement also explores the capabilities of NVIDIA Base Command, which facilitates swift deployment, expedites user onboarding and streamlines system management processes.

PODCAST

Data Strategy: Key Principles and Best Practices | 56 min | Data Engineering | Host: Richie Cotton; Guest: Boyan Angelov | DataTalks.Club Podcast

In this episode, you will discover how organizations leverage data to make informed decisions, drive innovation and gain a competitive edge. Tune in to this episode to uncover critical strategies for building a robust data foundation, optimizing data governance and unlocking the true potential of your data assets.

CONFS EVENTS AND MEETUPS

LLMs in Production | 15-16th June | Online

Join 50 Speakers from Stripe, Meta, Canva, Databricks, Anthropic, Cohere, Redis, Langchain, Chroma, Humanloop and so many more.

It is a two day conference of talking with some of our favorite people at the forefront of using LLMs in the wild, and an in-person workshop in San Francisco on how to build and deploy LLM based apps hosted by Anyscale.

________________________

Have any interesting content to share in the DATA Pill newsletter?

➡ Join us on GitHub

➡ Dig previous editions of DataPill

Adam from the GetInData | Part of Xebia

DATA Pill

2,497 followers

+ Subscribe

Richard Cotton

Senior Data Evangelist at DataCamp | DataFramed podcast host | Course creator | Author | Spends all day chatting about data & AI

It wasn't me speaking to Boyan. Wrong podcast host!

1 Reaction

See more comments

To view or add a comment, sign in

DATA Pill #056 - Fine Tuning vs. Prompt Engineering LLM, Kedro-Snowflake plugin, and more…

Adam Kawa

CEO at GetInData, ex-Spotify | Data & AI for banks, telecoms, retail & more.

ARTICLES

TUTORIAL

Recommended by LinkedIn

NEWS

PODCAST

CONFS EVENTS AND MEETUPS

DATA Pill

2,497 followers

More articles by Adam Kawa

Insights from the community

Others also viewed

The Dawn of the AI-Native Data Stack - Part 1

The Future of Data Science: How No-Code Tools Are Changing the Game

Data, meet Graph: Kubrick Partners with Neo4j

Data Engineering & Ice Cream, Together At Last

FLAIV-KING Weekly (Flink AI Vectors Kafka) for 18 Nov 2024

DATA Pill #062 - Netflix's Data Mesh, Lyft’s ML, Ubers lakehouse and (best?) open-source LLM

Subject: 💊 DATA Pill #124 - SQL Has Problems, RAG API, QueryGPT

💊 DATA Pill #112 - Decodable vs. Amazon MSF, Flink SQL - changelog and races

💊 DATA Pill #100 - dbt vs. Dataform, RAG for Quality Engineers, Text-to-SQL at Pinterest

💊 DATA Pill #108 - Orchestrating 2000+ dbt Models, Databricks + Tabular

Explore topics

ARTICLES

TUTORIAL

Recommended by LinkedIn

NEWS

PODCAST

CONFS EVENTS AND MEETUPS

DATA Pill

2,497 followers

More articles by Adam Kawa

💊 DATA Pill #137 - Your Top Picks of 2024!

💊 DATA Pill #136 - From Apache Iceberg to Real-Time AI: Trends, Tutorials, and Tools for Modern Data Pros

💊 DATA Pill #135 - LLM Fine-Tuning for Modern AI Teams, Data Pipelines with Apache Airflow

💊 DATA Pill #134 - Dear IT Departments, Please Stop Trying To Build Your Own RAG

💊 DATA Pill #133 - CDC at Pinterest, GCP & Iceberg, Databricks vs. Snowflake

💊 DATA Pill #132 - MinIO, Iceberg, Polars, chDB, NEO, and more!

DATA Pill #131 - Embeddings are underrated, The advent of the Open Data Lake

💊 DATA Pill #130 - Top 7 Alternatives to Apache Flink, How to run data science projects

💊 DATA Pill #129 - From ETL to AI, dbt: Incremental but Incomplete

💊 DATA Pill #128 - dbt™ at BlaBlaCar, What CDC is (and isn’t)

Insights from the community

Others also viewed

The Dawn of the AI-Native Data Stack - Part 1

The Future of Data Science: How No-Code Tools Are Changing the Game

Data, meet Graph: Kubrick Partners with Neo4j

Data Engineering & Ice Cream, Together At Last

FLAIV-KING Weekly (Flink AI Vectors Kafka) for 18 Nov 2024

DATA Pill #062 - Netflix's Data Mesh, Lyft’s ML, Ubers lakehouse and (best?) open-source LLM

Subject: 💊 DATA Pill #124 - SQL Has Problems, RAG API, QueryGPT

💊 DATA Pill #112 - Decodable vs. Amazon MSF, Flink SQL - changelog and races

💊 DATA Pill #100 - dbt vs. Dataform, RAG for Quality Engineers, Text-to-SQL at Pinterest

💊 DATA Pill #108 - Orchestrating 2000+ dbt Models, Databricks + Tabular

Explore topics