📢 Top RAG Papers of the Week (December Week 1, 2024)

Kalyan KS

Published Dec 6, 2024

[1] Impact of OCR on RAG

This paper introduces OHRBench for understanding the impact of OCR on RAG systems. OHRBench includes 350 carefully selected unstructured PDF documents from six real-world RAG application domains. The authors systematically evaluate the impact of these two noise types (Semantic Noise and Formatting Noise) and demonstrate the vulnerability of RAG systems. [Tweet] and [Paper]

[2] Auto-RAG

This paper introduces Auto-RAG, autonomous RAG for LLMs. Auto-RAG engages in multi-turn dialogues with the retriever, systematically planning retrievals and refining queries to acquire valuable knowledge. Auto-RAG achieves outstanding performances across six benchmarks because of the ability to autonomously interact with the retriever and effectively leverage the decision-making abilities of LLMs. [Tweet] and [Paper]

[3]Analyzing OpenAPI Chunking for RAG

This paper investigates OpenAPI chunking for RAG and addresses the question, “Can LLM agents be employed to reduce token count further and improve retrieval performance?”. For this, the authors a Discovery Agent that only receives a summary of the most relevant endpoints and retrieves details on demand. Results show that (i) LLM-based and format-specific chunking outperform naïve chunking methods. (ii) Relying on an agent further enhances these results as the agent splits the tasks into multiple fine granular subtasks. [Tweet] and [Paper]

If you find this newsletter informative, you can support me with a coffee.

[4] Know Your RAG

This paper shows that (i) using public question and answer (Q&A) datasets to assess retrieval performance can lead to non-optimal systems design, and (ii) common tools for RAG dataset generation can lead to unbalanced data. The authors (i) propose solutions to these issues based on the characterization of RAG datasets through labels and through label-targeted data generation and (ii) show that fine-tuned small LLMs can efficiently generate Q&A datasets. [Tweet] and [Paper]

[5] Understanding Retrieval Accuracy and Prompt Quality in RAG Systems

This paper presents a study to understand retrieval accuracy and prompt quality in RAG systems by conducting experiments on three code datasets, three QA datasets, and two LLMs. The authors focus on four design factors: retrieval document type, retrieval recall, document selection, and prompt techniques. Based on the results, they present nine actionable guidelines for detecting defects and optimizing the performance of RAG systems. [Tweet] and [Paper]

[6] MBA-RAG

The paper introduces MBA-RAG which uses a reinforcement learning-based framework that dynamically selects the most suitable retrieval strategy based on query complexity. The approach leverages a multi-armed bandit algorithm, which treats each retrieval method as a distinct “arm” and adapts the selection process by balancing exploration and exploitation. MBA-RAG achieves new state-of-the-art results on multiple single-hop and multi-hop datasets while reducing retrieval costs. [Tweet] and [Paper]

Do subscribe to the newsletter so that you won't miss interesting updates related to Generative AI, LLMs, Agents and RAG.

Kalyan KS, Research Scientist(NLP) at Akmmus AI Labs

Top LLM Papers of the Week

29,309 followers

+ Subscribe

James Anto Arnold James Sagayaraj

Data Analyst at WorkL | 4+ Years in Data Engineering | Master's in Data Science | Skilled in Python, SQL, and Machine Learning | Enthusiast in NLP

Thanks for sharing:)

Pradeep Yadav (He/Him)

Senior AI Engineer | AI/ML/LLM Developer | IIT Hyderabad | JNV | Navodaya

Thanks for sharing

1 Reaction

Pramod Thatti

Senior Project Manager at Infosys

Thanks for sharing

1 Reaction

Chirag Chawla

Head of Technology @21 Spheres || IIT(BHU) Varanasi

Very informative

3 Reactions

Daniel Svonava

Vector Compute @ Superlinked | xYouTube

nice initiative

3 Reactions

See more comments

To view or add a comment, sign in

See all

📢 Top RAG Papers of the Week (December Week 1, 2024)

Kalyan KS

[1] Impact of OCR on RAG

[2] Auto-RAG

[3]Analyzing OpenAPI Chunking for RAG

Recommended by LinkedIn

[4] Know Your RAG

[5] Understanding Retrieval Accuracy and Prompt Quality in RAG Systems

[6] MBA-RAG

Kalyan KS, Research Scientist(NLP) at Akmmus AI Labs

Top LLM Papers of the Week

29,309 followers

More articles by this author

Insights from the community

Others also viewed

Some things just write themselves

LLM fine-tuning and model selection + other resources

Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes

Dashboards for different stages of the ML project + other resources

What Data Science Forgot

You have to fall in love with the Insights not with the Models (or with Coding)

Decision Tree

23-4-1 Getting started with Pinecone Vector Database

SHAP is not all you need (or why you should always use permutation feature importance)

Why should treat outliers with Nearest Neighbor and Local Outlier Factor?

Explore topics

[1] Impact of OCR on RAG

[2] Auto-RAG

[3]Analyzing OpenAPI Chunking for RAG

Recommended by LinkedIn

[4] Know Your RAG

[5] Understanding Retrieval Accuracy and Prompt Quality in RAG Systems

[6] MBA-RAG

Kalyan KS, Research Scientist(NLP) at Akmmus AI Labs

Top LLM Papers of the Week

29,309 followers

📢 Top LLM Papers of the Week (December Week 2, 2024)

Dec 13, 2024

📢 Top LLM Papers of the Week (December Week 1, 2024)

Dec 7, 2024

Top RAG Papers of the Week (November Week 4, 2024)

Nov 30, 2024

📢 Top LLM Papers of the Week (November Week 4, 2024)

Nov 29, 2024

Top RAG Papers of the Week (November Week 3, 2024)

Nov 24, 2024

☀️ Top LLM Papers of the Week (November Week 3, 2024)

Nov 22, 2024

Top RAG Papers of the Week (November Week 2, 2024)

Nov 17, 2024

Top LLM Papers of the Week (November Week 2, 2024)

Nov 16, 2024

Top RAG Papers of the Week (November Week 1, 2024)

Nov 9, 2024

Top LLM Papers of the Week (November Week 1, 2024)

Nov 8, 2024

Insights from the community

Others also viewed

Some things just write themselves

LLM fine-tuning and model selection + other resources

Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes

Dashboards for different stages of the ML project + other resources

What Data Science Forgot

You have to fall in love with the Insights not with the Models (or with Coding)

Decision Tree

23-4-1 Getting started with Pinecone Vector Database

SHAP is not all you need (or why you should always use permutation feature importance)

Why should treat outliers with Nearest Neighbor and Local Outlier Factor?

Explore topics