LLMs: Where We Are and Where We're Heading

LLMs: Where We Are and Where We're Heading

This article is an abstract from my new book IRREPLACEABLE. Please like, comment, and share if you appreciate the insight.

Thanks, Pascal


AI is evolving rapidly, and it's time to take stock of where we are and what's coming next. Let's focus on the capabilities of AI models, especially the Large Language Models (LLMs) behind chatbots like ChatGPT and Gemini. These models are getting smarter all the time. Understanding why and how can help us predict future developments.

To grasp this, we need to look at how these models are trained. I'll try to explain this without getting too technical, which means simplifying some complex ideas. My tech-savvy readers will hopefully forgive this approach.


The Power of Scale in Large Language Models

The key to understanding LLMs is scale. Put simply, bigger models tend to be smarter. When we say "bigger," we mean models with more parameters – the adjustable values that help the model predict what to say next. These larger models need more training data (measured in tokens, which are often words or parts of words) and more computing power (measured in FLOPs, or Floating Point Operations).

Think of FLOPs as the number of simple math operations a computer does. More FLOPs mean more computational work during AI training. The result? Models that can handle tougher tasks, score higher on tests, and generally seem more intelligent.

Here's a real-world example that shows why scale matters. Bloomberg created a specialized AI called BloombergGPT, trained on their vast financial data. It used 200 ZetaFLOPs of computing power (that's a 2 followed by 23 zeros) and was great at tasks like analyzing financial documents. But GPT-4, a model not specifically trained for finance, still outperformed it. Why? GPT-4 is simply much larger – about 100 times bigger – making it more capable across the board.

Building bigger models isn't easy, though. It's not just about gathering more data. You need more computing time, more computer chips, and more energy. To get a significantly better model, you typically need to increase your data and computing power by about ten times. This usually means costs go up by a factor of ten, too.


The Different Waves of LLMs

The story of AI progress is largely about increasing model size, following a generational pattern. Each new generation requires extensive planning and resources to achieve that tenfold increase in data and computing power. We call the most advanced models at any given time "frontier models."


For simplicity, let's break down these waves:

1. Wave 1 Models (2022): Think ChatGPT-3.5. These models kicked off the Generative AI boom. They use less than 10^25 FLOPs and typically cost under $10M to train. There are many Gen1 models, including open-source versions.

2. Wave 2 Models (2023-2024): GPT-4 is the poster child here. These models need between 10^25 and 10^26 FLOPs and might cost over $100M to train. We now have several Gen2 models.

3. Wave 3 Models (2024-2026?): While not here yet, models like GPT-5 and Grok 3 are on the horizon. They'll likely need 10^26 to 10^27 FLOPs and could cost billions to train.

4. Wave 4 Models and Beyond: We might see these in a couple of years, potentially costing over $10B to train. Most experts I've talked to believe scaling benefits will continue at least through Gen4. Beyond that, we might see capabilities increase up to 1,000 times over Gen3 by 2030, but it's not certain. This is why there's so much talk about finding the energy and data for future models.


Today's Leading Models Benchmark



GPT-4 started the Wave 2 era, but other companies have caught up. We're now nearing the first Wave 3 models. Let's look at the top five Wave 2 models:

1. GPT-4o: This powers ChatGPT and Microsoft Copilot. It's a jack-of-all-trades, handling voice, images, files, code generation, web searches, and more.

2. Claude 3.5 Sonnet: A clever model that excels with text. It can work with images and files too, but doesn't generate images or voice.

3. Gemini 1.5 Pro: Google's flagship model. It has a huge memory, can process various data types including video, and can run code (though it's not always clear when).

4. Grok 2: A surprise contender from Elon Musk's X.AI. It taps into Twitter for information and can create images without many restrictions.

5. Llama 3.1 405B: Meta's offering stands out because it's open-source. Anyone can download, use, and even modify it to some extent.




A New Way to Scale: The Reasoning Approach

Recently, OpenAI revealed a game-changer with their o1-preview and o1-mini models. These models scale differently – not during training, but after. It turns out that the computing power used for "reasoning" about a problem (called inference compute) follows its own scaling law.

This "reasoning" process involves the model taking multiple thinking steps before answering. OpenAI's innovation was to make their models go through this process, creating hidden reasoning tokens before giving a final answer. They discovered something fascinating: the longer a model "reasons," the better its answer tends to be.

This development represents a significant shift in how we approach artificial intelligence, one that mirrors human cognitive processes in intriguing ways. To understand its importance, it's helpful to consider psychologist Daniel Kahneman's concept of System 1 and System 2 thinking.

Kahneman describes System 1 as fast, intuitive, and automatic – the kind of thinking we do when we recognize a friend's face or answer simple math like 2+2. System 2, on the other hand, is slower, more deliberative, and logical – the thinking we engage in when solving complex problems or making difficult decisions.

Traditional LLMs, in many ways, have operated primarily in a System 1-like mode. They provide quick, intuitive responses based on patterns in their training data. While impressive, this approach can lead to mistakes or shallow understanding, especially for complex queries.

OpenAI's new approach with o1-preview and o1-mini models introduces a System 2-like reasoning process to AI. By allowing the model to "reason" through multiple steps before answering, it's mimicking the deliberative, logical approach of human System 2 thinking. This is why the quality of answers improves with longer reasoning time – the model is essentially engaging in deeper, more thorough analysis.


Why does this matter? There are several important implications:

Enhanced Problem-Solving: This approach allows AI to tackle more complex, nuanced problems that require step-by-step reasoning, potentially expanding AI's applicability in fields like scientific research, strategic planning, and complex decision-making.

Improved Accuracy: By engaging in deeper reasoning, these models are likely to produce more accurate and reliable outputs, reducing the risk of the "hallucinations" or mistakes that can occur with quick, intuitive responses.

Reduced Biases: With more sophisticated training, larger datasets, and advanced reasoning capabilities, these models can often provide more balanced and accurate responses. This is crucial for expanding AI's applicability in sensitive areas like healthcare diagnostics or legal analysis, where impartiality is paramount.

• Transparency and Explainability: The step-by-step reasoning process could make AI decision-making more transparent and explainable, a crucial factor for building trust in AI systems, especially in sensitive applications.

• Cognitive Alignment: This development brings AI reasoning closer to human cognitive processes, potentially leading to more natural and effective human-AI collaboration.



Towards Agentic LLMs

The advancement in AI reasoning capabilities paves the way for agentic AI - autonomous systems that can perceive, decide, and act to achieve goals. Models like o1-preview, with their multi-step reasoning, are crucial steps towards this future.

Agentic AI goes beyond responding to prompts; it engages with its environment, sets objectives, and works towards them. Such AI could break down complex goals, anticipate obstacles, and adapt strategies. In scientific research, it might formulate hypotheses and design experiments. In business, it could analyze trends and suggest strategic decisions.

The key is the AI's ability to combine rapid pattern recognition (System 1-like) with deliberative analysis (System 2-like). This allows navigation of complex scenarios with unprecedented nuance and adaptability. The result could be AI agents that understand context, evaluate progress, and autonomously adjust their approach.

However, this shift towards agentic AI raises critical ethical and practical questions. How do we ensure alignment with human values and maintain meaningful oversight? These challenges will be crucial as we transition to truly agentic AI, potentially reshaping our relationship with artificial intelligence.



What's Next?

With two scaling laws at play – one for training and one for "reasoning" – AI capabilities are set to soar in the coming years. Even if we hit a ceiling on training larger models (which seems unlikely for now), AI can still tackle harder problems by spending more time " reasoning."

As we keep improving how these models are built and trained, we're approaching a new frontier. The autonomous AI agents might be just around the corner. These systems could handle complex tasks with minimal human oversight, which could have far-reaching effects.


This was an abstract from the book IRREPLACEABLE. Please like and share if you appreciate the insight. Make sure you order your copy.

And don’t miss out on the IRREPLACEABLE Academy. Join a community of more than 3,000 like-minded people, access interactive courses, and actively train these skills every day for maximum impact.

Thanks, Pascal


#ai #artificialintelligence #futureofwork #skillsofthefuture #tech


John Tarnoff MA/MSP

Executive & Career Transition Coach | Unretirement Advocate | Spiritual Psychologist | Helping Mid-Career Professionals Rediscover Their Deeper Purpose, Rebuild Their Confidence, & Pivot to Meaningful, Rewarding Careers

2mo

Informative article. However, you do not even hint at the electrical infrastructure demands that will be required to scale AI as it develops. Do you cover that in your book? It seems to be the elephant in the room that no one is addressing. Refurbing Three Mile Island (Microsoft) will be a drop in the bucket. What are your thoughts/predictions?

Jeff Bell

Founder, NeuroCIO - Smarter Leaders Powered By AI

2mo

Can't wait, Pascal!

Yohana Arévalo M.

🌐 CIO / CDO / CDAO - Data Analytics & AI Head | Gen AI | Artificial Intelligence

2mo

I think that the success of many models is that we can train them with learning from errors from experience so that they can adjust to reality. The theory says many things work but in real life you try it and it doesn't, so that sensitivity to the action of emotions has unpredictable things... Your head acts in a way but when it is mixed with your heart they come out. other results and that is what life consists of. No one sometimes imagines some endings when happy, sad, challenging emotions enter your life.

Muneeb Jutt

WordPress developer | Designer

2mo

Hi I am website designer and I am looking for new startup business

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics