DATA Pill #056 - Fine Tuning vs. Prompt Engineering LLM, Kedro-Snowflake plugin, and more…
Hi,
Your ultimate source for all things data has landed!
In this power-packed edition, I've curated a treasure trove of insights and innovations to fuel your data-driven journey.
Kedro-Snowflake plugin, LLM, unlocking the power of JunoDB, and much more is waiting for you to enjoy.
ARTICLES
From Image Classification to Multitask Modeling: Building Etsy’s Search by Image Feature | 7 min | Data Engineering | Eden Dolev, Alaa Awad | Etsy Blog
This image-based discovery tool on Etsy’s mobile apps is available now. Read the story on how the Etsy team was able to take a proof-of-concept hackathon project, and turn it into a production feature to help make the millions of unique and special items on Etsy more discoverable for buyers.
Dependency Management at Scale | 5 min | Data Engineering | Adrian Comisel | Yelp Engineering Blog
Keeping project dependencies up to date is crucial, but there is a Yokyo Drift. It actively scans all repositories in use at Yelp and submits pull requests that upgrade any outdated dependencies, and tracks and monitors the progress of these upgrades. Let’s take a quick look at this solution.
Run your first, private Large Language Model (LLM) on Google Cloud Platform | 16 min | LLM | Michał Bryś | GetInData | Part of Xebia Blog
Imagine you want to develop a personalized language model (LLM) powered assistant for generating financial report summaries whilst ensuring the utmost privacy for your organization, but also guaranteeing the utmost privacy for your organization is a challenge.
In his latest blog post, Michal delves into three essential aspects:
Fine Tuning vs. Prompt Engineering Large Language Models | 9 min | LLM | Niels Bantilan | MLOps Community Blog
Let's dive into this blog post, where Niels describes prompt engineering and fine-tuning in more detail, gives a practical sense of how they are different, and provides you with a few heuristics that will help you begin your fine-tuning journey.
In MORE LINKS you will read about unlocking the Power of JunoDB, and why Modern Data Platforms don’t do ETL anymore
TUTORIAL
From 0 to MLOps with ❄️ Snowflake Data Cloud in 3 steps with the Kedro-Snowflake plugin | 8 min | MLOps | Marcin Zabłocki, Marek Wiewiórka, Michał Bryś | GetInData | Part of Xebia Blog
Marcin, Marek and Michał unveil their newest Kedro-Snowflake plugin. Thanks to this, you can streamline your ML pipelines in Kedro and effortlessly execute them in a scalable Snowflake environment, and all it takes is three simple steps.
Recommended by LinkedIn
In MORE LINKS you will read about: Join a streaming data source with CDC data for real-time serverless data analytics using AWS Glue, AWS DMS, and Amazon DynamoDB
NEWS
Announcing NVIDIA DGX GH200: The First 100 Terabyte GPU Memory System | 4 min | AI | Pradyumna Desale | Nvidia Developer
During COMPUTEX 2023, NVIDIA made an exciting revelation by introducing the NVIDIA DGX GH200. This groundbreaking development in GPU-accelerated computing is set to revolutionize handling massive AI workloads. Apart from highlighting the critical elements of the NVIDIA DGX GH200's architecture, this announcement also explores the capabilities of NVIDIA Base Command, which facilitates swift deployment, expedites user onboarding and streamlines system management processes.
PODCAST
Data Strategy: Key Principles and Best Practices | 56 min | Data Engineering | Host: Richie Cotton; Guest: Boyan Angelov | DataTalks.Club Podcast
In this episode, you will discover how organizations leverage data to make informed decisions, drive innovation and gain a competitive edge. Tune in to this episode to uncover critical strategies for building a robust data foundation, optimizing data governance and unlocking the true potential of your data assets.
CONFS EVENTS AND MEETUPS
LLMs in Production | 15-16th June | Online
Join 50 Speakers from Stripe, Meta, Canva, Databricks, Anthropic, Cohere, Redis, Langchain, Chroma, Humanloop and so many more.
It is a two day conference of talking with some of our favorite people at the forefront of using LLMs in the wild, and an in-person workshop in San Francisco on how to build and deploy LLM based apps hosted by Anyscale.
________________________
Have any interesting content to share in the DATA Pill newsletter?
➡ Join us on GitHub
➡ Dig previous editions of DataPill
Adam from the GetInData | Part of Xebia
Senior Data Evangelist at DataCamp | DataFramed podcast host | Course creator | Author | Spends all day chatting about data & AI
1yIt wasn't me speaking to Boyan. Wrong podcast host!