The Rise of Small Language Models (SLMs)

Ed Waters

Senior Business Analyst at National Australia Bank

Published Dec 26, 2024

In the rapidly evolving landscape of artificial intelligence, the mantra “bigger is always better” has long dominated discussions surrounding model development. Each month, we witness the emergence of larger models boasting an increasing number of parameters, with companies investing billions in expansive AI data centers to support them. However, a pivotal shift is on the horizon. At NeurIPS 2024, Ilya Sutskever, co-founder of OpenAI, suggested that “pre-training as we know it will unquestionably end,” signaling a potential end to the era of relentless scaling. This shift invites us to reconsider our approach and explore the promising potential of Small Language Models (SLMs), which typically feature up to 10 billion parameters.

The Rise of Small Language Models

The industry is beginning to embrace SLMs as a viable alternative. Clem Delangue, CEO of Hugging Face, posits that up to 99% of use cases could be effectively addressed using these smaller models. This trend is echoed in recent startup requests from Y Combinator, highlighting that while large models are impressive, they often come with significant costs and challenges related to latency and privacy.

Cost Efficiency: A Key Consideration

The economic implications of large language models (LLMs) are significant and multifaceted. Businesses face not only the high costs associated with hardware and infrastructure but also the environmental impact of maintaining such systems. Subscription prices for LLM-based applications have already begun to rise; for instance, OpenAI recently introduced a $200/month Pro plan, indicating a broader trend in escalating costs across the industry. A case in point is Embodied’s Moxie robot, which utilized the OpenAI API. Despite initial success—children interacting with Moxie sending hundreds of messages daily—the operational costs proved unsustainable, leading to the product's discontinuation and leaving many children without their robotic companion. In contrast, fine-tuning specialized SLMs for specific domains can provide a more economical solution. These models not only require less data and resources but can also operate on modest hardware, including smartphones.

Environmental Impact

The environmental footprint of training large models cannot be overlooked. For example, training GPT-3 consumed as much electricity as an average American household uses in 120 years and emitted approximately 502 tons of CO₂—equivalent to the annual emissions from over a hundred gasoline cars. In stark contrast, deploying a smaller model like a 7B parameter model would require only5%of the energy consumption associated with larger counterparts.

Performance on Specialised Tasks

While cost efficiency is crucial, performance remains paramount. Surprisingly, numerous studies indicate that SLMs can outperform LLMs in specialized tasks. For instance:

Medical Sector: The Diabetica-7B model achieved 87.2% accuracy on diabetes-related tests compared to GPT-4's 79.17%.
Legal Sector: An SLM with just 0.2B parameters reached 77.2% accuracy in contract analysis.
Content Moderation: LLaMA 3.1 8B outperformed GPT-3.5 by significant margins in accuracy and recall across various subreddits.

Recommended by LinkedIn

What is ChatGPT and Why is it Important?

Blockchain Council 1 year ago

Customizing Large Language Models (LLM) at Focus…

Focus Corporation 8 months ago

T-minus 1,138,800 minutes. #AGI

Michai Mathieu Morin 10 months ago

These examples illustrate that smaller models can excel in specific domains where they are fine-tuned.

Security and Compliance Advantages

Utilising LLMs through APIs raises critical concerns regarding data security and regulatory compliance (e.g., HIPAA, GDPR). Handing over sensitive information to third-party providers increases risks and complicates adherence to stringent regulations. In contrast, SLMs offer several advantages:

Simplified Audits: Smaller models facilitate easier audits and customization for compliance.
Deployment Flexibility: SLMs can run on isolated networks or low-end hardware.
Adaptability: They can be fine-tuned quickly to meet evolving regulatory requirements.
Distributed Security Architecture: SLMs allow for a modular security system that can be independently updated and tested.

The Future: AI Agents and Specialised Models

As we consider the future landscape of AI, Ilya Sutskever’s insights hint at a return to simpler principles—“do one thing and do it well.” The potential for AI agents powered by SLMs could lead to transformative changes across various sectors, creating markets far larger than traditional SaaS solutions. However, it’s essential to acknowledge some limitations of SLMs compared to their larger counterparts:

Limited Task Flexibility: SLMs excel in narrow domains but may struggle with broader applications.
Context Window Limitations: Smaller models typically have shorter context windows than LLMs.
Emergence Capabilities Gap: Certain advanced abilities may only manifest at higher parameter thresholds.

Conclusion

In conclusion, while large language models have their place in AI development, small language models present an increasingly attractive alternative for many businesses seeking cost-effective solutions without sacrificing performance or security. As we navigate this evolving landscape, companies should consider integrating SLMs into their strategies—particularly in regulated fields like healthcare, finance, or law—where efficiency and compliance are paramount. By embracing this shift towards smaller models, organizations can not only enhance their operational efficiency but also contribute positively towards environmental sustainability in AI development.

To view or add a comment, sign in

The Rise of Small Language Models (SLMs)

Ed Waters

Senior Business Analyst at National Australia Bank

The Rise of Small Language Models

Cost Efficiency: A Key Consideration

Environmental Impact

Performance on Specialised Tasks

Recommended by LinkedIn

Security and Compliance Advantages

The Future: AI Agents and Specialised Models

Conclusion

More articles by Ed Waters

Insights from the community

Others also viewed

HuggingGPT: A New Way to Solve Complex AI Tasks with Language

What investors are getting wrong about AI

GenAI Weekly — Edition 31

Large Language Models to Large Action Models - Step towards Artificial General Intelligence

Pioneering AI Frontier: Unleashing Natural Language Interface

Why Do We Need Neuro-symbolic AI to Model Pragmatic Analogies?

The Future of Artificial Intelligence: Navigating Small and Large Language Models

Understanding Large Language Models and Their Implications: An Interview with OpenAI's CTO

Small Language Models: Making AI More Accessible and Efficient

Why Tech Leaders Are Turning to Small Language Models: A Smart Move in the AI Landscape

Explore topics

The Rise of Small Language Models

Cost Efficiency: A Key Consideration

Environmental Impact

Performance on Specialised Tasks

Recommended by LinkedIn

Security and Compliance Advantages

The Future: AI Agents and Specialised Models

Conclusion

More articles by Ed Waters

Always be networking! Join our live BA Meetup on 22nd Jan!

What California's Fire Insurance Crisis Teaches Us About Managing Unpredictability

The Fallout of Poor Communication: Lessons from ‘Nuclear War: A Scenario’ for Business Analysts

AI helps identify paint chemistry of Berlin Wall murals

Applying AI to Cialdini's 7 Principles of Influence

Securing Undersea Fibre Optic Cables in a New Age of Geopolitical Tensions

Platform engineering: the evolution of DevOps in a cloud-native world.

What Ants Teach Us About Agile Teams

Why everyone is capable of, and can benefit from, nathematical thinking...

Our brains are vector databases — here’s why that’s helpful when using AI

Insights from the community

Others also viewed

HuggingGPT: A New Way to Solve Complex AI Tasks with Language

What investors are getting wrong about AI

GenAI Weekly — Edition 31

Large Language Models to Large Action Models - Step towards Artificial General Intelligence

Pioneering AI Frontier: Unleashing Natural Language Interface

Why Do We Need Neuro-symbolic AI to Model Pragmatic Analogies?

The Future of Artificial Intelligence: Navigating Small and Large Language Models

Understanding Large Language Models and Their Implications: An Interview with OpenAI's CTO

Small Language Models: Making AI More Accessible and Efficient

Why Tech Leaders Are Turning to Small Language Models: A Smart Move in the AI Landscape

Explore topics