Top LLM Papers of the Week (October Week 1, 2024)

Kalyan KS

Published Oct 4, 2024

[1] Comprehensive evaluation of OpenAI's o1

This paper evaluates the performance of OpenAI's o1 LLM across a diverse array of complex reasoning tasks, spanning multiple domains, including computer science, mathematics, natural sciences, medicine, linguistics, and social sciences. o1 LLM achieves human-level or superior performance in areas ranging from coding challenges to scientific reasoning and from language processing to creative problem-solving. For example, o1 achieves 100% accuracy in high school-level mathematical reasoning tasks, providing detailed step-by-step solutions. [Tweet] and [Paper]

[2] Toolkit to prepare your data for LLM application development

This paper introduces Data-Prep-Kit (DPK), an easy to use toolkit to prepare data for LLM application development. DPK allows to prepare data on a local machine or effortlessly scale to run on a cluster with thousands of CPU Cores. Importantly, DPK is open-source, extensible, and scale-flexible. [Tweet] and [Paper]

[3] RAGProbe

This paper presents RAGProbe, a novel automated approach to evaluate RAG applications. RAGProbe addresses limitations of RAGAS by generating variations in question-answer pairs to trigger failures in RAG pipelines. RAGProbe outperforms the existing state-of-the-art methods, by increasing the failure rate by 51% on average per dataset. [Tweet] and [Paper]

[4] Auto-Demo Prompting

Batch prompting is a common technique in large language models (LLMs) used to process multiple inputs simultaneously but the performance degrades with increase in batch size. This paper introduces Auto-Demo prompting which enhances batch prompting performance by leveraging the question-output pairs from earlier questions within a batch as demonstrations for subsequent answer inference. Importantly, Auto-Demo prompting demonstrate its effectiveness in mitigating performance degradation and occasionally outperforming single prompts. [Tweet] and [Paper]

[5] Finance Massive Text Embedding Benchmark (FinMTEB)

This paper investigates the requirement of domain-specific embedding models and introduces Finance Massive Text Embedding Benchmark (FinMTEB) a counterpart to MTEB that consists of financial domain-specific text datasets. The authors observe that seven SOTA embedding models exhibit a significant performance drop on FinMTEB compared to their performance on MTEB. [Tweet] and [Paper]

[6] Energy-efficient LLMs with Additions

This paper presents linear-complexity multiplication L-Mul algorithm that approximates floating point number multiplication with integer addition operations. The new algorithm costs significantly less computation resource than 8-bit floating point multiplication but achieves higher precision. Moreover, evaluation results on popular benchmarks show that directly applying L-Mul to the attention mechanism is almost lossless. [Tweet] and [Paper]

[7] Open-RAG

This paper introduces Open-RAG to enhance reasoning capabilities in RAG with opensource LLMs. Open-RAG transforms an arbitrary dense LLM into a parameter-efficient sparse mixture of experts (MoE) model capable of handling complex reasoning tasks, including both single- and multi-hop queries. Experimental results show that the Llama2-7Bbased OPEN-RAG outperforms state-of-the-art LLMs and RAG models such as ChatGPT, SelfRAG, and Command R+ in various knowledge intensive tasks. [Tweet] and [Paper]

[8] LLM-based Open-Source Tool for Fact Verification

This paper presents Loki, an LLM-based open-source tool for fact verification. Loki decomposes the fact-checking task into a five-step pipeline: breaking down long texts into individual claims, assessing their check-worthiness, generating queries, retrieving evidence, and verifying the claims. Loki has been optimized for latency, robustness, and cost efficiency at a commercially usable level. [Tweet] and [Paper]

[9] LLM-based Text-to-SQL Systems (Surveys)

This paper presents a comprehensive survey of LLM-based Text-to-SQL systems which includes early rule-based models to advanced LLM approaches, in-context learning methods, fine-tuning methods, benchmarks, evaluation methods, evaluation metrics etc. The paper also highlights key challenges such as computational efficiency, model robustness, and data privacy. [Tweet] and [Paper]

[10] Specialised LLMs for Astronomy

This paper introduces AstroMLab 2 LLMs which are specialised for Astronomy. AstroMLab 2 LLMs include AstroLLaMA-3-8B and AstroLLaMA-2-70B which are developed based on AstroLLaMA series LLMs. [Tweet] and [Paper]

Do subscribe to the newsletter so that you won't miss reading interesting LLM and RAG papers.

Kalyan KS, Research Scientist(NLP) at Akmmus AI Labs.