This Week in AI #1: Scaling Q&A, Efficiency Trade-offs, and Increasing Credibility
Hey!
Welcome to the very first edition of This Week in AI: RAG Edition.
In this newsletter, we'll cover recent developments in the world of Gen AI but with RAG focus every week. We'll look into research papers that'll help us become better at - using Gen AI to extract better answers from a large corpus, using it for customer support, sales and enterprise search in general.
We'll examine research papers in detail - What problem they're solving? Go deeper into methodologies and look at the improvements.
We're excited to share latest insights with you!
Let’s kick off this inaugural edition with a sneak peek at three fascinating research papers we'll discuss today:
So let's start!
1. Pistis-RAG: A Scalable Cascading Framework Towards Content-Centric Retrieval-Augmented Generation
In Greek mythology, Pistis symbolized good faith, trust, and reliability. Drawing inspiration from these principles, Pistis-RAG is a scalable multi-stage framework designed to address the challenges of large-scale retrieval-augmented generation (RAG) systems.
What problem are they trying to solve?
Scaling RAG systems means compromises with performance or efficiency. But, a new study has made strides to enhance the quality and relevance of the generated responses by optimizing the retrieval and ranking processes at scale. The key issues they're tackling include:
How are they solving the problem?
The paper introduces a comprehensive framework called Pistis-RAG, which integrates several key components to address these challenges:
1. Matching Service:
2. Ranking Service:
3. Experimental Setup:
- BEG-M3: Retrieves the top 10 candidate few-shots.
- BEG-reranker-larger: Pre-ranks the candidate few-shots to select the top five.
- Llama-2-13B-chat: Generates text from the selected few-shot prompts using a sophisticated transformer architecture.
What are the results?
Here are experimental results demonstrating the effectiveness of the Pistis-RAG framework. Key performance metrics used include precision, recall, and F1-score.
What's next?
Pistis-RAG demonstrates significant improvements in the performance of RAG systems, especially in handling prompt order sensitivity and improving information retrieval efficiency. The comprehensive framework can be used by enterprises to ensure more coherent and relevant responses, reducing frustrations and enhancing overall user satisfaction.
2. RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs
In this work, we propose a novel instruc tion fine-tuning framework RankRAG, which instruction-tunes a single LLM for the dual purpose of context ranking and answer generation in RAG
What problem are they trying to solve?
There's always a trade-off between efficiency and performance in retrieval-augmented generation (RAG) systems. Even accepting this, there's still room for optimization.
In this research paper, the researchers are addressing key limitations in current Retrieval-Augmented Generation (RAG) systems used by large language models (LLMs) and trying to find better ways to minimize performance losses while improving efficiency in model training and inference, particularly in the context selection and answer generation stages of RAG pipelines..
How are they solving the problem?
Previous RAG approaches often suffered from two main issues: inefficient context selection and the need for separate retrieval and ranking models. RankRAG addresses these problems through a novel instruction fine-tuning framework called "RankRAG".
Recommended by LinkedIn
RankRAG is a novel instruction fine-tuning framework designed to integrate context ranking and answer generation within a single LLM.
RankRAG uses a clever trick of casting all tasks into a standardized (question, context, answer) format. This allows the model to transfer knowledge between ranking and generation tasks effectively.
During inference, RankRAG first reranks a set of retrieved contexts, then generates the final answer using only the most relevant ones. This approach acts like "attention sinks" for RAG, helping the model maintain focus on the most important information without losing overall context. The results show that even with a relatively small amount of ranking data (just 1% of a standard dataset), RankRAG can significantly outperform dedicated ranking models and larger language models on various knowledge-intensive tasks.
Experimental setup:
What are the results?
3. Ground Every Sentence: Improving Retrieval-Augmented LLMs with Interleaved Reference-Claim Generation
We propose a method called RECLAIM, which alternately generates citations and answer sentences, to enable large models to generate answer with citations.
What problem are they trying to solve?
You type a question, and LLM generates and answer - with bold claims. How'd you know it's the truth?
This research paper aims to shed light on the two main issues with Gen AI QnA systems:
How are they solving the problem?
The paper introduces a method called RECLAIM (Reference-Claim Interleaving Model) to solve these problems. The approach involves several key steps:
Step 1: Task Formalization: The task is defined as generating an output consisting of fine-grained references and claims. Given a query q and several reference passages D, the model generates an output O that alternates between references (r1,r2,...rn) and claims (c1,c2,....cn). Each reference substantiates a corresponding claim, and together they form a complete, coherent answer.
Step 2: Training Dataset Construction:
The initial data set used WebGLM-QA with 43,579 samples of rich references and detailed answers. The dataset was further segmented to identify relevant citations and used NLI methods to to ensure attribution quality.
The data was then filtered to remove mismatched citations.The remaining refined dataset of 9,433 samples was used. This ensured text consistency and high-quality attributions.
Step 3: Model Training:
Two models, ReferModel and ClaimModel, are trained separately:
For the ReferModel, the researchers performed full fine-tuning with a learning rate of 2e-5 over 3 epochs. For the ClaimModel, they employed Lora tuning (Hu et al., 2021) with a learning rate of 5e-5 over 5 epochs.
Step 4: Evaluation and Comparison:
The approach is evaluated against existing methods like ALCE. Metrics such as fluency, correctness, and citation accuracy are used to measure performance.
What are the results?
The RECLAIM methodology shows improvements in various metrics.
Overall, RECLAIM w/IG achieved the best performance in fluency and citation quality, while the claim-only method had the highest correctness score of 37.8 but at the cost of increased response length and reduced fluency.
Other interesting reads:
Enjoying the content?
Make sure to subscribe for your weekly dose of AI insights, and don’t forget to share in the comments if there's a particular aspect of any research paper you’d like us to explore further.
What’s the biggest question you have about RAG and Gen AI? Let us know in the comments. We read each one and may just feature yours in a future newsletter!
Happy reading, and see you next week!
By Alltius