Interbreeding Camels 🐪🐪Version 2 - Camels in a Changing Climate

Raghul Gopal

Data Science at Logitech | AWS Community Builder 🥷(ML & GenAI) | Talks and Writes about AI, AGI & Cloud Deployments of AI & AGI | Public Speaker 🎤| Blogger 📚| Unlocking Data Secrets with Math & AI 👨🔬

Published Apr 24, 2024

Hello All,

This is Raghul Gopal, an AWS Community Builder (ML & GenAI), a Research freak who is an enthusiast in AI & AGI Research. Welcome to Learn with Me Newsletter Week 1 where I will be focusing on the advancements of Generative AI.

Camels in a Changing Climate: Enhancing Language Model Adaptation with TULU 2 🐪🐫🐪🐫

In the previous issues, TULU 1 – How far did the Camels go? Based on the mixture of Instruction Tuned Datasets. Today, we have TULU 2 – An improved series of TULU Models for advancing the understanding and best practices of adapting pre-trained language models to downstream tasks and user preferences.

In this paper, the researchers released four important outcomes.

TULU V2 Mix – an improved collection of high-quality instruction datasets
TULU2 – Llama 2 Models finetune n V2 Mixture
TULU2 + DPO – TULU2 Models trained with Direct Performance Optimization (DPO) including the largest DPO-trained model to date (TULU2 + DPO 70B)
CODE TULU2 – Code Llama finetune on V2 mix that outperforms Code Llama and its instruction-tuned variant Code Llama – instruct

The capabilities of large language models (LMs) to follow user requests have been progressing rapidly through a wide range of openly available models, datasets, and training methods. Since the release of TULU Models, they have been a number of significant advances from the release of improved fine-tuning datasets, to increasingly powerful base models, and accessible adaptation methods for combining these components.

Let’s see the features of TULU2. From the Llama 1 Model used in TULU1, the Llama 2 is being used in TULU2. The main difference between Llama1 and Llama 2 is that, it follows the same architecture but pretrained on significantly more tokens (2 billion tokens as opposed to 1 or 1.4 billion tokens), and it has improved performance. The researcher also experimented with Code Llama, where the Llama 2 Models further pretrained on Code Data.

The experiments on the models are forecasted to 7B, 13B, and 70B parameter sizes for Llama 2, and for Code Llama, the 7B, 13B, and 34B are the parameter sizes. TULU V1 Mix is based on ablations over human and GPT generated Datasets.

The following sources are taken for TULU V1 Mix namely

FLAN
CoT
Open Assistant 1
Share GPT
GPT 4 Alpaca
Code Alpaca
LIMA
Wizard LM Evol Instruct V2
Open Orca
Science Literature
Hardcoded data from prompt responses

The RLHF training is based primarily on PPO (Proximal Policy Optimization) but recent advances introduce Offline RL, Reward Model data filtering called Rejection Sampling (RS), Reinforced Self Training (ReST), and direct integration of preference data. TULU2 was trained with DPO (Direct Preference Optimization) due to the simplicity of its implementation. For DPO training, they followed the Zephyr – Beta Approach. They also used QLoRA to reduce the compute demands without reducing performance.

Evaluation Metrics of TULU2 and its peers with different benchmarks. From the table, you can see that TULU 2 outperforms all open models in Average

Evaluation results of TULU V2 Models with and without DPO training

Results of Llama 2 Models finetuned with V1 Mix and V2 Mix data mixtures, and ShareGPT

MT Bench and AlpacaEval scores of TULU 2 and TULU 2 + DPO

Results from Llama 2 Models finetuned with and without QLoRA.

Evaluation results of Code Llama with TULU Models

Training Hyperparameters

For Instruction Tuning / Supervised Fine Tuning

• Precision: BFloat16

• Epochs: 2

• Weight decay: 0

• Warmup ratio: 0.03

• Learning rate: 2e-5 (1e-5 for 70B)

Interbreeding Camels 🐪🐪Version 2 - Camels in a Changing Climate

Raghul Gopal

Data Science at Logitech | AWS Community Builder 🥷(ML & GenAI) | Talks and Writes about AI, AGI & Cloud Deployments of AI & AGI | Public Speaker 🎤| Blogger 📚| Unlocking Data Secrets with Math & AI 👨🔬

Recommended by LinkedIn

Learn with Me

1,505 follower

More articles by Raghul Gopal

Insights from the community

Others also viewed

Explore the Future with Gen AI: Your Weekly Passport to Innovation!

NewMind AI Journal #6

NewMind AI Journal #2

OpenAI Is Shining a Light on the “Black Box” of AI + Google IO is today!

Llama 3: An Open-Source Game-Changer for AI Applications

The Sparks of AGI May Catch Fire

Ahead of AI #8: The Latest Open Source LLMs and Datasets

Realistic Evaluation of Self-Data Distilled Fine-Tuning for Pruned Large Language Models

Unveiling the Future: Exploring the Depth of Large Language Model-Based Agents in AI

Stochastic Parrots?

Explore topics

Recommended by LinkedIn

Learn with Me

1,505 follower

More articles by Raghul Gopal

Attention as an RNN - Aaren ⚒️ | Don't Memorize - Be like a Goldfish🐟to mitigate Memorization in LLMs 📚

Mixed Modal FM 🤨- Chances of Llama 4 | Aya 23 - Successor of Aya 101 👁️

Safety Responses Automation 🛟| Segment Anything with Lightweight Model 🚈|📚 - Release #9

The first releases of Code LLM - Code Intelligence Breakdown | Multi-Program Synthesis

Predecessor of Phi3 👨💻- Textbook Are All You Need 📚| Speech-to-Speech 👨🎤Translation with Monolingual Data 🎤

Magician behind Coding 🪄🧝♂️🧝♀️| SLMs are the best? 🦸♂️🦸♀️

Refresh LLMs with SE Data 🔍 | Interbreeding of Camels 🐪

Learn First Multimodal LLM without Trouble and Perfect Medical LLM for Medicinal Research

Fine Tune LLMs - Don't go for Billion Parameters 🥷

Focusing on Attention and Hallucinations