Top LLM Papers of the Week (August Week 3, 2024)

Kalyan KS

Published Aug 19, 2024

[1] RAGChecker

RAGChecker is a fine-grained evaluation framework for Retrieval-Augmented Generation (RAG) systems. This framwork includes diagnostic metrics for both retrieval and generation modules. Evaluation shows better correlation of RAGChecker with human judgments compared to other evaluation metrics. [Paper] and [Tweet]

[2] The AI Scientist

This paper introduces “AI Scientist”, a groundbreaking comprehensive framework for fully automatic scientific discovery. This framework empowers LLMs to conduct independent research, implement an idea and write a research paper at a meager cost of less than $15. [Paper] and [Tweet]

[3] HybridRAG

This paper introduces HybridRAG, a novel RAG approach that combines GraphRAG with VectorRAG to improve question-answering systems for financial documents. Experiment results show that HybridRAG outperforms both GraphRAG and VectorRAG individually. [Paper] and [Tweet]

[4] 10,000+ Word Generation from Long Context LLMs

Current long context LLMs can handle extensive inputs but struggle with lengthy outputs. This paper introduces AgentWrite, an agent-based pipeline to create LongWriter-6k, a dataset of 6,000 supervised fine-tuning examples with output lengths ranging from 2,000 to 32,000 words in length. [Paper] and [Tweet]

[5] Integrating Web Search and Knowledge Graphs into RAG

This paper introduces WeKnow-RAG, a new RAG method that integrates Knowledge Graphs and Web search into a Retrieval-Augmented Generation system. Moreover, WeKnow-RAG includes a feature for LLMs to evaluate the trustworthiness of their generated answers, further improving the system's reliability. [Paper] and [Tweet]

[6] Evaluation Benchmark for LLM Tool Use

Recent advancements in LLMs have increased interest in their tool-use capabilities for real-world challenges. This paper introduces ToolSandbox an Evaluation Benchmark for LLM Tool Use Capabilities. ToolSandbox employs a dynamic evaluation approach that assesses both intermediate and final milestones over an arbitrary trajectory which offers a more comprehensive evaluation of LLM performance. [Paper] and [Tweet]

[7] NL2SQL with Large Language Models

This paper presents a comprehensive survey of NL2SQL with Large Language Models covering its entire lifecycle from the following four aspects namely model, data, evaluation and error analysis. Moreover, the authors provide a rule of thumb for developing NL2SQL solutions. [Paper] and [Tweet]

[8] Model Merging in LLMs

Model merging is an efficient technique that doesn't require raw data collection or expensive computation. This survey paper presents a new taxonomy of model merging techniques and discusses applications of model merging in LLMs, multimodal large language models, and over 10 machine learning subfields. [Paper] and [Tweet]

[9] A Suite of Clinical LLMs

Med42-v2 introduces clinical domain specific LLMs base on Llama3 architecture. Unlike general LLMs that avoid answering clinical queries, Med42-v2 LLMs are trained to answers clinical queries. Med42-v2 LLMs outperform GPT-4 model and the original Llama3 models across various medical benchmarks. [Paper] and [Tweet]

[10] Chain of Condition Prompt

This paper introduces Chain of Condition a new prompting technique to addresses conditional question answering challenges by explicitly identifying conditions, constructing their logical relationships, verifying condition satisfaction, and solving logical expressions. Chain of Condition outperforms existing prompting baselines on benchmark datasets, establishing a new state-of-the-art. With advanced models like GPT-3.5-Turbo or GPT-4, it surpasses supervised baselines in few-shot settings. [Paper] and [Tweet]

If you like this, do subscribe to the newsletter so that you won't miss any of the interesting LLM papers.