LLMs – Should We Train or Fine-Tune Them?

Mario Rozario

Manager, Technical Delivery & Enablement | Thought Leader | Writer for Generative AI, DataDrivenInvestor & ILLUMINATION (Medium Publications) | Distinguished Toastmaster

Published Nov 7, 2024

In the last few months, a debate has been brewing in India. Intellectuals and experts from all sides have been chipping in with their definitive arguments.

Everyone was amazed when ChatGPT first introduced the world to the marvel of large-language models (LLMs).

When they learned about the enormous effort involved, including the engagement of contract workers from Kenya to fine-tune the model using RLHF (Reinforcement Learning with Human Feedback), the expenditure of billions of dollars to acquire the most expensive GPU system, and the extensive time required to train the model, their eyes popped out.

During a visit to India in 2023, Sam Altman stated in not too many words that it didn't make sense for India to compete with OpenAI in developing another LLM.

In 2023, sending a rocket into orbit seemed a lot more affordable than the cost of training a model like ChatGPT.

However, some leaders, such as CP Gurnani, the CEO of Tech Mahindra, publicly responded to Sam Altman's statement with a vehement "Game On."

In a span of just 2 years, a lot of things have changed.

Hold on !! Train What? Fine-Tune What?

First let's get this out of the way.

The AI Models that came before ChatGPT were mostly Machine Learning and Deep Learning (of course one could argue that ChatGPT too is a deep learning model), which were mostly predictive in nature. Some of the most common models used in the industry today are based on prediction, customer churn, marketing campaigns, etc.

Generative models did exist before that. They were GAN's (Generative Adversarial Networks). In fact, do you remember the fake videos that were made about Barack Obama and other personalities in the last decade?

Yes - those were all GANs!!

ChatGPT is based on generative AI models such as GPT. They are a lot more powerful and generative in nature than anything that came before them. Naturally, their training process requires a significant amount of computing power. Hence the cost!! For example, ChatGPT required several months to train on the given corpus of data.

So if it took a couple of months to train the model, why would anyone want to do it all over again? This was precisely Sam Altman's point.

That's where fine-tuning comes into play. Fine-tuning is when you take a huge foundational model like ChatGPT and then customize it to a particular use case using data that is very specific to a task.

Recently, Nandan Nilekani emphasized this point at the Meta AI Summit in Bangalore. He categorically said that training LLMs should be left to the biggies of Silicon Valley while we in India fine-tune and customize them to our use cases for widespread usage.

Nilekani unexpectedly faced criticism for making this statement, even though many of us found it obvious.

There are a few factors to consider before looking at the big picture.

Recommended by LinkedIn

To BOT or Not to BOT? That is the Question. Keep…

Kevin D. Turner 1 year ago

How will ChatGPT impact the future of financial…

Arjun Vir Singh 1 year ago

Is OpenAI ahead in the AI arms race? Or- is there a…

Maciej Szczerba 1 year ago

Infrastructure Cost

Although one can argue that the cost of GPUs are not getting any cheaper, the cost of training large language models have been steadily decreasing. The graph below from the ARK group's study shows that the cost of training LLMs has decreased over time.

While training LLMs with state-of-the-art infrastructure remains prohibitively expensive for many companies, it is now within the reach of large firms. That said, processing power, or GPU power, has more than quadrupled over the last few years with NVIDIA's chips and networking.

So if the cost to train LLM's decreases, won't this be another step towards opening up this field to a wider range of players?

Certainly we all want a larger market to choose from!

As more firms venture into producing AI factories, a greater number of large language models will emerge, thereby expanding our options beyond the few big names we often hear.

That is, however, not the whole story. After a few years into the GenAI revolution, most of the sources that are used by these LLM's to train data now have all been locked up or are in the process of being locked up by deals with large companies.

That brings us to the next problem: Data

The Data Deluge

Here we are in pole position!

What do I mean here? India boasts a vast market, with approximately 1.5 million individuals. It's safe to say that 2/3rds of them are now digital, on the lower end of the spectrum. The data of 1 billion people presents an enormous opportunity for companies to train their models on.

A lot of this data is locked in Telco's (telecommunication companies), social media, and a ton of the other mobility, online shopping, and social commerce apps that people use. Not a lot of data here is used for training.

In fact, for India, we have a data deluge. Across the length and breadth of the country, even as contours change, so do languages and local preferences in cultures and customs. This reflects in the buying patterns and spending habits of individuals, which differ to some extent with the consumer class in the west.

Indians, for instance, buy a lot of gold before festivals. Westerners buy gold as an investment only when they feel it would give them favorable returns, and certainly not prior to any religious festival.

Models trained on Indian datasets will account for all these regional and local modalities.

When Nandan Nilekani said we could become the AI use case capital of the world, he meant this. The demographic diversity at our disposal will yield dividends in this whole process.

I am of the opinion that we should do both. Given that India's corporate leaders are willing to invest billions in infrastructure to train large language models, it would be foolish to focus solely on fine-tuning existing models.

On the other hand, with a population of 1 billion and multiple languages, the volume of training data is sufficient to develop models that can accommodate much larger use cases than are currently possible.

It looks like things are just getting started.

Abhishek Bagarka

Enterprise Solutions - Data & AI

1mo

Very informative

2 Reactions

To view or add a comment, sign in

LLMs – Should We Train or Fine-Tune Them?

Mario Rozario

Manager, Technical Delivery & Enablement | Thought Leader | Writer for Generative AI, DataDrivenInvestor & ILLUMINATION (Medium Publications) | Distinguished Toastmaster

Hold on !! Train What? Fine-Tune What?

Recommended by LinkedIn

Infrastructure Cost

The Data Deluge

More articles by Mario Rozario

Insights from the community

Others also viewed

GPTChat vs The Dry Farmer

Where Do Companies Fit in an AI-First World?

What happens to software jobs with the rise of Generative AI?

GPT & The End Of The Middle Class

Uncover the Future of AI: Top Tools You Can't Afford to Miss 🚀 💻 🧠 🤖

Weekly News

AI Cheat Codes

Talking about AI and our responsibilities

Strategic Business and Societal Implications of ChatGPT / GPT-3

Here's What AI Can Actually Do For Sourcing Today.

Explore topics

Hold on !! Train What? Fine-Tune What?

Recommended by LinkedIn

Infrastructure Cost

The Data Deluge

More articles by Mario Rozario

The Networking That Powers the Chips

Part - 2: From Assignment to Insight - The Role of Generative AI in Undergraduate Projects

Can the UXL Alliance Take-On NVIDIA?

NVIDIA AI Summit - Some Unforgettable Moments

Part -1: Will College Degrees Matter 3 years from now?

Should Tech Founders Know How To Code?

From LLM's to LAM's: - The Real Disruption

All eyes on the California Senate Bill 1047

The Underrated Art of IT Troubleshooting

Do I really need NVIDIA's CUDA?

Insights from the community

Others also viewed

GPTChat vs The Dry Farmer

Where Do Companies Fit in an AI-First World?

What happens to software jobs with the rise of Generative AI?

GPT & The End Of The Middle Class

Uncover the Future of AI: Top Tools You Can't Afford to Miss 🚀 💻 🧠 🤖

Weekly News

AI Cheat Codes

Talking about AI and our responsibilities

Strategic Business and Societal Implications of ChatGPT / GPT-3

Here's What AI Can Actually Do For Sourcing Today.

Explore topics