Retrieval-Augmented Fine-Tuning (RAFT): Combining RAG with Fine-Tuning! 🔥

Retrieval-Augmented Fine-Tuning (RAFT): Combining RAG with Fine-Tuning! 🔥

Welcome to the AI in 5 newsletter with Clarifai!

Every week we bring you new models, tools, and tips to build production-ready AI!

Here is the summary of what we will be covering this week: 👇

  • New model: Prompt-Guard-86M
  • Blog: RAFT combines the benefits of RAG and Fine-Tuning
  • Workflow: Audio Sentiment Analysis
  • Blog: Multimodal Evaluation Benchmarks
  • Tip of the week: Get your first Visual Search App in ~1 minute!

Prompt-Guard-86M Model 🔥

LLM-powered applications can be vulnerable to prompt attacks, where malicious prompts are designed to manipulate the model's behavior against the developer's intentions.

There are two different prompt attacks:

Prompt Injections: These are inputs that take advantage of combining untrusted data from third parties or users into a model's context, causing the model to follow unintended instructions.

Jailbreaks: These are malicious instructions intended to override a model's built-in safety and security features.

Prompt-Guard-86M is a multilingual classifier model designed to detect and prevent these prompt injections and jailbreak attacks in LLM-powered applications.

The model is now available on the Clarifai Platform, We have some pre built examples to get you started.

Try it out here: Prompt-Guard-86M

Retrieval Augmented Fine-Tuning ⚡

What is RAFT?

Retrieval Augmented Generation (RAG) adds extra knowledge from outside sources to the prompts, and Fine-tuning gives the model more data to learn from.

Each method has pros and cons, and the choice between them often depends on the project's needs.

RAFT combines the benefits of RAG and Fine-tuning by improving the model's understanding and use of domain-specific knowledge while maintaining accuracy. This ensures that the LLM generates more accurate and contextually relevant answers.

Learn more about RAFT, its performance, and results with the Llama 3.1 8B model in the blog here: RAFT

Audio Sentiment Analysis Workflow 💥

Get the sentiment of the audio file with the ASR-Sentiment workflow.

The Audio Speech Recognition Sentiment (ASR-Sentiment) workflow takes audio as input, converts it to text using an ASR model, and then runs sentiment analysis on the text. 

The sentiment analysis model works best for English. If you need to analyze audio in other languages, you can create a custom workflow by adding a translation model between the ASR and sentiment analysis models. 

This ensures the sentiment analysis model receives the text in English. 

Login to the platform and try out the workflow here: ASR-Sentiment workflow

Multimodal LLM Evaluation benchmarks

Multimodal models can process text and images as inputs, and in some cases, they also handle other modalities like video and speech. 

The blog below provides an overview of ten key multimodal datasets and benchmarks that can be used to evaluate the performance of such models, particularly those focused on Visual Question Answering (VQA).

Examples include TextVQA, DocVQA, OCRBench, MathVista, and more. Read on here: Multimodal Evaluation benchmarks

Tip of the week: 📌

Get your first Visual Search App in ~1 minute!

Visual search helps you compare images based on their visual similarity and getting started with your first visual search app is fast and simple.

Set up your Clarifai account ➡ Create an app ➡ Upload your images ➡ Search for similar faces.

Check out the guide here.

Want to learn more from Clarifai? “Subscribe” to make sure you don’t miss the latest news, tutorials, educational materials, and tips. Thanks for reading!

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics