shakti web3’s Post

Name: Language model performance varies with scale | shakti web3 posted on the topic | LinkedIn
Uploaded: 2024-05-21T12:30:12.430Z
Duration: 16 s
Channel: shakti web3

shakti web3

12 followers

8mo

Understanding how language model performance varies with scale is critical tobenchmark and algorithm development. Scaling laws are one approach to buildingthis understanding, but the requirement of training models across manydifferent scales has limited their use. We propose an alternative,observational approach that bypasses model training and instead builds scalinglaws from ~80 publically available models. Building a single scaling law frommultiple model families is challenging due to large variations in theirtraining compute efficiencies and capabilities. However, we show that thesevariations are consistent with a simple, generalized scaling law where languagemodel performance is a function of a low-dimensional capability space, andmodel families only vary in their efficiency in converting training compute tocapabilities. Using this approach, we show the surprising predictability ofcomplex scaling phenomena: we show that several emergent phenomena follow asmooth, sigmoidal behavior and are predictable from small models; we show thatthe agent performance of models such as GPT-4 can be precisely predicted fromsimpler non-agentic benchmarks; and we show how to predict the impact ofpost-training interventions like Chain-of-Thought and Self-Consistency aslanguage model capabilities continue to improve. #LanguageModels #ScalingLaws #ModelEfficiency #PerformancePrediction #EmergentPhenomena

To view or add a comment, sign in

More Relevant Posts

Shakti web solutions

40 followers
8mo
Report this post
Understanding how language model performance varies with scale is critical tobenchmark and algorithm development. Scaling laws are one approach to buildingthis understanding, but the requirement of training models across manydifferent scales has limited their use. We propose an alternative,observational approach that bypasses model training and instead builds scalinglaws from ~80 publically available models. Building a single scaling law frommultiple model families is challenging due to large variations in theirtraining compute efficiencies and capabilities. However, we show that thesevariations are consistent with a simple, generalized scaling law where languagemodel performance is a function of a low-dimensional capability space, andmodel families only vary in their efficiency in converting training compute tocapabilities. Using this approach, we show the surprising predictability ofcomplex scaling phenomena: we show that several emergent phenomena follow asmooth, sigmoidal behavior and are predictable from small models; we show thatthe agent performance of models such as GPT-4 can be precisely predicted fromsimpler non-agentic benchmarks; and we show how to predict the impact ofpost-training interventions like Chain-of-Thought and Self-Consistency aslanguage model capabilities continue to improve. #LanguageModels #ScalingLaws #ModelEfficiency #PerformancePrediction #EmergentPhenomena
Like Comment
To view or add a comment, sign in
Anish Mallick

Sr Data Scientist
8mo
Report this post
https://lnkd.in/ddZFCnst The paper is about performance of language models over special cases. It showed that, for current generation LLMs, to handle rare cases with zero shot learning, we might need exponentially large datasets. Hence it might not be good idea to keep building bigger and bigger model.

arxiv.org
Like Comment
To view or add a comment, sign in
Jegan Nagarajan

Leadership|SAAS|GenAI lData Streaming Expert| Customer Experience|Cloud Architect|Serverless Evangelist|Automation
1mo
Report this post
Chunking Done Right Retrieval-Augmented Generation (RAG) combines the power of retrieval systems with large language models (LLMs) to deliver responses that are both relevant and context-aware. By leveraging external knowledge from databases or documents, LLMs generate smarter and more insightful outputs. But here’s the catch: 🛑 LLMs have context window limits. If your data isn't properly chunked, you risk missing key information or overwhelming the model with irrelevant details. 💡 Why is Chunking Crucial? ✅ Efficiency: Process only the data that matters, reducing computational overhead. ✅ Relevance: Retrieve accurate and contextually aligned outputs. ✅ Preservation of context: Ensure coherent and meaningful responses. 🔑 The goal? Not to chunk for the sake of it, but to organize your data effectively so it's valuable and retrievable when needed. With the right chunking strategy, your RAG system becomes a powerhouse of precision and performance. Let us look at the chunking strategies in later Posts. #learnai #RAG #chunking
Like Comment
To view or add a comment, sign in
Sivas Subramaniyan

Designing Applied AI solutions
6mo
Report this post
Earlier this year, tech companies were racing to develop large language models with billions of parameters, trained on trillions of tokens. Before GPT 4o, Gemini Flash marked a shift away from this trend. Andrej Karpathy's tweet explains this shift to a trend towards creating smaller, more efficient models, trained on highly curated datasets. (https://lnkd.in/gEqZ7Ziy) Language models derive their ability to "think" from the datasets they're trained on. Smaller models can be seen as compressed versions of large models, retaining essential 'reasoning capabilities' without the need for vast amounts of information. This makes small models ideal for applications where cost and latency are critical, such as Retrieval-Augmented Generation (RAG) and Agentic architecture. GPT-4o, for example, competes with Gemini Flash, which costs one-fifth of Gemini 1.5 Pro yet performs equally well in language processing, text-to-SQL tasks, and multimodal functions. The next step might be models like Microsoft's Phi, designed for edge-level deployment.

Andrej Karpathy (@karpathy) on X

twitter.com
Like Comment
To view or add a comment, sign in
Mayank Kejriwal

Scientist/Professor@USC | Author
9mo
Report this post
Don't expect an LLM to navigate your computer and do 'everyday tasks' just yet. This paper will allow us to know when we should start worrying. For now, we humans are still much better at the everyday than language models (but for how long?) #llm #vlm

Musing 21: OSWORLD: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

aiscientist.substack.com
Like Comment
To view or add a comment, sign in
TELUS Digital

633,481 followers
2mo
Report this post
Retrieval augmented generation (RAG) in large language models (LLMs) can efficiently retrieve and generate relevant information. However, the quality of the output depends heavily on the accuracy and relevance of the underlying data. RAG-related errors can hinder the effectiveness of your model and lead to inaccurate or generic responses. To ensure your model’s responses are grounded in real-world data, it’s essential to optimize for accuracy to avoid RAG-related errors. Learn more about these errors and how to address them in the slides below. #RAG #LLM
Like Comment
To view or add a comment, sign in
MachineHack Generative AI

19,830 followers
7mo
Report this post
Large Language Models (LLMs) are impressive, but how do you get them answer your questions perfectly? Here's how through 2 popular methods - Prompt Engineering and Fine-Tuning. #promptengineering #finetuning #LLMs #machinehack
Like Comment
To view or add a comment, sign in
Jason Loo
2mo
Report this post
For those who are interested in how LLMs are developed, have a read below. For others, do have a look at high level how the process works. Diagram flow included :) Read “Developing Large Language Models (LLMs): A Step-by-Step Guide from Concept to Deployment“ by Wasim Rajput on Medium: https://lnkd.in/gW9iVwfy

Developing Large Language Models (LLMs): A Step-by-Step Guide from Concept to Deployment

medium.com
Like Comment
To view or add a comment, sign in
Giovanni De Luca

Senior Software Engineer | IT Consultant
9mo
Report this post
Can LLMs Handle Unlimited Context? Google researchers introduced a new concept called Infini-attention in their latest paper, enabling LLMs to process inputs of any length. Typical transformers reset their attention memory after each context window to manage new data, losing previous context. For example, in a 500K token document split into 100K token windows, each segment starts fresh without memory from the others. Infini-attention, instead, retains and compresses the attention memory from all previous segments. This means in the same 500K document, each 100K window maintains access to the full document's context. The model compresses and reuses key-value states across all segments, allowing it to pull relevant information from any part of the document. #ai #ml #llm #google #infini #attention

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

arxiv.org
Like Comment
To view or add a comment, sign in
Bhaskara Reddy Sannapureddy

Senior Project Manager|Infosys|B.E(Hons) BITS, Pilani & PGD in ML & AI at IIITB & Master of Science in ML & AI at LJMU, UK | (Building AI for World & Create AICX)(Learn, Unlearn, Relearn)
5mo
Report this post
Explore #AgentGen which uses LLMs to synthesize diverse environments and planning tasks in a scalable way. The environments are synthesized using a corpus from a variety of domain-specific texts The planning tasks are subsequently generated (easy to difficult) and conditioned on the synthesized environments. The novelty seems to be with how the planning tasks are generated; they use a bidirectional evolution method which effectively automates and simplifies the process. Previously, the trajectories to tune models were generated using manually designed planning tasks. The synthesized data is then used to instruction-tune an LLM which enhances the planning abilities of the LLM-based agent. Results show that AgentGen improves an LLMs’ planning ability, e.g., an instruction-tuned Llama-3 8B surpasses GPT-3.5 in overall performance. It even outperforms GPT-4 in certain tasks. Paper :https://lnkd.in/gkgWMZsg
Like Comment
To view or add a comment, sign in

12 followers

View Profile Connect

shakti web3’s Post

More Relevant Posts

Explore topics