🥇Top ML Papers of the Week

DAIR.AI

Democratizing Artificial Intelligence Research, Education, and Technologies

Published Apr 28, 2024

Welcome to the Top ML Papers of the Week (April 22 - April 28).

1). Phi-3 - a new 3.8B parameter language model called phi-3-mini trained on 3.3 trillion tokens and is reported to rival Mixtral 8x7B and GPT-3.5; has a default context length of 4K but also includes a version that is extended to 128K (phi-mini-128K); combines heavily filtered web data and synthetic data to train the 3.8B models; it also reports results on 7B and 14B models trained on 4.8T tokens (phi-3-small and phi-3-medium). (paper | tweet)

2). OpenELM - a new open language model that employs a layer-wise scaling strategy to efficiently allocate parameters and leading to better efficiency and accuracy; comes with different sizes such as 270M, 450M, 1.1B, and 3B; achieves a 2.36% improvement in accuracy compared to OLMo while requiring 2× fewer pre-training tokens. (paper | tweet)

3). Arctic - an open-source LLM (Apache 2.0 license.) that uses a unique Dense-MoE Hybrid transformer architecture; performs on par with Llama3 70B in enterprise metrics like coding (HumanEval+ & MBPP+), SQL (Spider) and instruction following (IFEval); claims to use 17x less compute budget than Llama 3 70B; the training compute is roughly under $2 million (less than 3K GPU weeks). (paper | tweet)

4). Make Your LLM Fully Utilize the Context - presents an approach to overcome the lost-in-the-middle challenge common in LLMs. It applies an explicit "information-intensive" training procedure on Mistral-7B to enable the LLM to fully utilize the context. It leverages a synthetic dataset where the answer requires fine-grained information awareness on a short segment (∼128 tokens) within a synthesized long context (4K−32K tokens), and 2) the integration and reasoning of information from two or more short segments. The resulting model, FILM-7B (Fill-in-the-Middle), shows that it can robustly retrieve information from different positions in its 32K context window. (paper | tweet)

5). FineWeb - a large-scale web dataset containing 15 trillion tokens for training language models; filters and deduplicates CommonCrawl between 2013 and 2024 and the goal is to improve the quality of the data. (paper | tweet)

Sponsor message

DAIR.AI presents a live cohort-based course, Prompt Engineering for LLMs, where you can learn about advanced prompting techniques, RAG, tool use in LLMs, agents, and other approaches that improve the capabilities, performance, and reliability of LLMs. Use promo code MAVENAI20 for a 20% discount. Enroll here.

Recommended by LinkedIn

🔮 Moving beyond RAG

Pascal Biese 9 months ago

Issue #283 - The ML Engineer 🤖

Alejandro Saucedo 7 months ago

🧑🔬 The Next Impact Factor

Pascal Biese 11 months ago

6). AI-powered Gene Editors - achieves precision editing of the human genome with a programmable gene editor design with an AI system powered by an LLM trained on biological diversity at scale. (paper | tweet)

7). AutoCrawler - Combines LLMs with crawlers with the goal of helping crawlers handle diverse and changing web environments more efficiently; the web crawler agent leverages the hierarchical structure of HTML for progressive understanding; employs top-down and step-back operations, and leverages the DOM tree structure, to generate a complete and executable crawler. (paper | tweet)

8). Graph Machine Learning in the Era of LLMs - provides a comprehensive overview of the latest advancements for Graph ML in the era of LLMs; covers the recent developments in Graph ML, how LLM can enhance graph features, and how it can address issues such as OOD and graph heterogeneity. (paper | tweet)

9). Self-Evolution of LLMs - provides a comprehensive survey on self-evolution approaches in LLMs. (paper | tweet)

10). Naturalized Execution Tuning (NExT) - trains an LLM to have the ability to inspect the execution traced of programs and reason about run-time behavior via synthetic chain-of-thought rationales; improves the fix rate of a PaLM 2 model on MBPP and Human by 26.1% and 14.3%; the model also shows that it can generalize to unknown scenarios. (paper | tweet)

Reach out to hello@dair.ai if you would like to promote with us. Our newsletter is read by over 55,000 AI Researchers, Engineers, and Developers.