This AI newsletter is all you need #96
What happened this week in AI by Louie
We are glad to say this was a week for Open-Source AI and small LLMs, with the release of LLama 3 by META and Microsoft's announcement of Phi-3. LLama 3 is a big win for open-source and cheap and fast smaller models, but it has some limitations. The company chose to focus the model on text format, English language, and a shorter context window (8k).
LLama 3 is a very similar model architecture to LLama 2 - the key difference with v3 is a more intelligent and aggressive training data filter (including the use of llama 2 as a data classifier), 7x more data (now a massive 15 trillion tokens) and improved and scaled use of human feedback in fine-tuning. The breakthrough is huge jumps in model capabilities and benchmark scores for small model formats (8bn and 70bn parameters) and huge jumps in capabilities of the best open-source models. The speed advantage of these smaller models will be particularly important for agent workflows where latency per call can stack up. LLama 3 8B and 70B models can be run at home or fine-tuned to specific use cases. They can also be accessed on the cloud, such as on Together.ai, for $0.2 and $0.9 per million tokens, respectively, relative to GPT-3.5-Turbo and GPT-4-Turbo at an average (using 3-1 input vs output) of $0.75 and $15. Grok also offers LLama 3, with 70B at an average $0.64 cost per million tokens with faster inference speed.
With LLama 3, we think the biggest gains relative to existing models are likely coming from better training data filtering. META also chose to push hard on training data quantity relative to model parameter size. This is a sub-optimal choice for training cost vs. intelligence (very far from Chinchilla optimal, and more intelligence per unit of training compute would have come from extra parameters rather than extra training tokens). However, the choice is geared towards improved inference costs, creating a smarter, smaller model that will be cheaper to run.
Microsoft’s release of Phi-3 3.8B, 7B, and 14B has even more impressive benchmark scores relative to model size. The models were trained on highly filtered web data and synthetic data (3.3T to 4.8T tokens) and traveled further along the path of data quality prioritization. We await more details on the model release, real-world testing, and whether it is fully open source.
Current costs and key KPIs of leading LLMs
Why should you care?
When choosing the best LLM for your application, there are many trade-offs and priorities to choose between. Superior model affordability and response speed generally come together with smaller models. At the same time, intelligence, coding skills, multi-modality, and larger context lengths are usually things you pay more for with larger models. We think LLama 3 and Phi-3 will change the game for smaller, faster, cheaper models and will be a great choice for many LLM use cases. Particularly given that it is open-source and flexible, it can be fine-tuned and tailored to specific use cases.
It is incredible how far we have come with LLMs in less than two years! In August 2022, the best model available was da-Vinci-002 from OpenAI for $60 per million tokens, scoring 60% on the MMLU test (16k questions across 57 tasks with human experts at 89.8%). Now, Lllama 3 8B costs an average of $0.2 or 300x cheaper while scoring 68.4% MMLU. The most capable models (GPT-4 & Opus) are now at 86.8% on MMLU while multimodal and have 50-100x larger context length. Now, there are a large number of models that are competitive for certain use cases. We expect this to accelerate innovation and adoption of LLMs even further.
- Louie Peters — Towards AI Co-founder and CEO
Hottest News
The FineWeb dataset consists of over 15 Trillion tokens of cleaned and deduplicated English web data from CommonCrawl between 2013 and 2024. Models trained on FineWeb outperform RefinedWeb, C4, DolmaV1.6, The Pile, and SlimPajama. It is accessible on HuggingFace.
Meta has launched Llama 3, the newest addition to its Llama series, accessible on Hugging Face. It is available in 8B and 70B versions, each with base and instruction-tuned variants featuring enhanced multilingual tokenization. Llama 3 is designed for easy deployment on platforms like Google Cloud and Amazon SageMaker.
Mistral unveiled Mixtral 8x22B, an efficient sparse Mixture-of-Experts model with 39B active out of 141B total parameters. It specializes in multilingual communication, coding, and mathematics and excels in reasoning and knowledge tasks. The model has a 64K token context window, is compatible with multiple platforms, and is available under the open-source Apache 2.0 license.
Adobe announced that it aims to update Premiere Pro to add plug-ins to emerging third-party AI video generator models, including OpenAI’s Sora, Runway ML’s Gen-2, and Pika 1.0. With this addition, Premiere Pro users would be able to edit and work with live-action video captured on traditional cameras alongside and intermixed with AI footage.
Google has unveiled the Cloud TPU v5p, an AI chip that delivers nearly triple the training speed of its predecessor, the TPU v4, reinforcing its position in AI services and hardware. Additionally, Google introduced the Google Axion CPU, an Arm-based processor that competes with similar offerings from Microsoft and Amazon, boasting a 30% performance improvement and better energy efficiency.
Five 5-minute reads/videos to keep you learning
The article examines the financial considerations of leveraging OpenAI's API versus self-hosting LLMs. It highlights the trade-off between the greater control over data achieved through self-hosting, which comes with higher costs for fine-tuning and maintenance, and the potential cost savings of OpenAI's usage-based pricing model.
Despite everyone’s focus on hardware, AI software is what protects NVIDIA. This blog dives into the role and importance of the CUDA software ecosystem in NVIDIA, maintaining its leading position in AI.
The 2024 AI Index Report from Stanford presents key trends in AI, including technical progress, rising costs of advanced models, and AI-enhanced workforce productivity. It also notes the uptick in AI-focused regulations and investments, particularly in generative AI. This is set against increased public consciousness and concern regarding AI's societal implications.
In this article, the author explores the capabilities of Gemini 1.5 Pro and Google AI Studio. The tutorial provides an overview of Google AI Studio, including its fundamentals, various modes, how to utilize the available multimodal features, and when to use Google AI Studio vs. Gemini.
This article explores why it is difficult to build a moat with AI, especially LLMs and presents ideas for a potentially successful approach. Success in AI applications increasingly depends on leveraging unique, customer-specific data for training rather than just innovations in models like LLMs. Data engineering is key to creating competitive AI solutions.
Recommended by LinkedIn
Repositories & Tools
1. LLM Transparency Tool is an open-source interactive toolkit for analyzing the internal workings of Transformer-based language models.
2. Llama Factory unifies the fine-tuning of 100+ LLMs.
3. Reader converts any URL to an LLM-friendly input with a simple prefix.
4. Open Agent Studio is a no-code agent editor.
5. AgentRun is a Python library that makes it easy to run Python code safely from LLMs with a single line of code.
Top Papers of The Week
Megalodon, a new model architecture designed for efficient sequence modeling with unlimited context length, addresses the scalability limitations of Transformers due to their quadratic complexity and poor performance with long sequences. Building upon the Mega architecture, it incorporates advancements such as complex exponential moving average (CEMA), timestep normalization, and a normalized attention mechanism.
Microsoft has developed VASA, a framework that can create realistic talking faces with expressive visual affective skills from a single image and audio input, featuring synchronized lip-syncing and dynamic facial expressions for enhanced authenticity.
This paper introduces Mini-Gemini, a simple framework enhancing multi-modality Vision Language Models (VLMs). Mini-Gemini mines the potential of VLMs and simultaneously empowers current frameworks with image understanding, reasoning, and generation. It supports a series of dense and MoE LLMs from 2B to 34B.
This paper introduces RecAI, a practical toolkit designed to augment recommender systems with the advanced capabilities of LLMs. RecAI provides a suite of tools, including Recommender AI Agent, Recommendation-oriented Language Models, Knowledge Plugin, RecExplainer, and Evaluator, to facilitate the integration of LLMs into recommender systems.
Researchers address instability in LLM alignment methods such as RLHF and DPO by proposing Trust Region DPO (TR-DPO), which actively updates the reference policy during training. TR-DPO outperforms DPO by up to 19%, per GPT-4 automatic evaluations.
Quick Links
1. Poe introduces multi-bot chat and plans enterprise tier to dominate the AI chatbot market. With a recent $75 million funding round, Poe is betting big on the potential of a thriving ecosystem around AI-powered chatbots.
2. OpenAI seeks to dismiss Elon Musk's lawsuit, calling contract claims ‘revisionist.’ The company has stated that Musk’s claim that it violated its contractual commitments to create an open-source, nonprofit entity is an attempt to promote his own competing AI firm.
3. After months of leaks, OpenAI has reportedly fired two researchers linked to company secrets going public. According to reports from The Information, the firm has fired researchers Leopold Aschenbrenner and Pavel Izmailov.
Who’s Hiring in AI
Interested in sharing a job opportunity here? Contact sponsors@towardsai.net.
If you are preparing your next machine learning interview, don’t hesitate to check out our leading interview preparation website, confetti!
Think a friend would enjoy this too? Share the newsletter and let them join the conversation.
Internet Payment Gateway & B2B Sales Expert at Asan Pardakht
8moThe 2024 AI Index Report from Stanford presents key trends in AI, including technical progress, rising costs of advanced models, and AI-enhanced workforce productivity. 👌