LLMs – Should We Train or Fine-Tune Them?
In the last few months, a debate has been brewing in India. Intellectuals and experts from all sides have been chipping in with their definitive arguments.
Everyone was amazed when ChatGPT first introduced the world to the marvel of large-language models (LLMs).
When they learned about the enormous effort involved, including the engagement of contract workers from Kenya to fine-tune the model using RLHF (Reinforcement Learning with Human Feedback), the expenditure of billions of dollars to acquire the most expensive GPU system, and the extensive time required to train the model, their eyes popped out.
During a visit to India in 2023, Sam Altman stated in not too many words that it didn't make sense for India to compete with OpenAI in developing another LLM.
In 2023, sending a rocket into orbit seemed a lot more affordable than the cost of training a model like ChatGPT.
However, some leaders, such as CP Gurnani, the CEO of Tech Mahindra, publicly responded to Sam Altman's statement with a vehement "Game On."
In a span of just 2 years, a lot of things have changed.
Hold on !! Train What? Fine-Tune What?
First let's get this out of the way.
The AI Models that came before ChatGPT were mostly Machine Learning and Deep Learning (of course one could argue that ChatGPT too is a deep learning model), which were mostly predictive in nature. Some of the most common models used in the industry today are based on prediction, customer churn, marketing campaigns, etc.
Generative models did exist before that. They were GAN's (Generative Adversarial Networks). In fact, do you remember the fake videos that were made about Barack Obama and other personalities in the last decade?
Yes - those were all GANs!!
ChatGPT is based on generative AI models such as GPT. They are a lot more powerful and generative in nature than anything that came before them. Naturally, their training process requires a significant amount of computing power. Hence the cost!! For example, ChatGPT required several months to train on the given corpus of data.
So if it took a couple of months to train the model, why would anyone want to do it all over again? This was precisely Sam Altman's point.
That's where fine-tuning comes into play. Fine-tuning is when you take a huge foundational model like ChatGPT and then customize it to a particular use case using data that is very specific to a task.
Recently, Nandan Nilekani emphasized this point at the Meta AI Summit in Bangalore. He categorically said that training LLMs should be left to the biggies of Silicon Valley while we in India fine-tune and customize them to our use cases for widespread usage.
Nilekani unexpectedly faced criticism for making this statement, even though many of us found it obvious.
There are a few factors to consider before looking at the big picture.
Recommended by LinkedIn
Infrastructure Cost
Although one can argue that the cost of GPUs are not getting any cheaper, the cost of training large language models have been steadily decreasing. The graph below from the ARK group's study shows that the cost of training LLMs has decreased over time.
While training LLMs with state-of-the-art infrastructure remains prohibitively expensive for many companies, it is now within the reach of large firms. That said, processing power, or GPU power, has more than quadrupled over the last few years with NVIDIA's chips and networking.
So if the cost to train LLM's decreases, won't this be another step towards opening up this field to a wider range of players?
Certainly we all want a larger market to choose from!
As more firms venture into producing AI factories, a greater number of large language models will emerge, thereby expanding our options beyond the few big names we often hear.
That is, however, not the whole story. After a few years into the GenAI revolution, most of the sources that are used by these LLM's to train data now have all been locked up or are in the process of being locked up by deals with large companies.
That brings us to the next problem: Data
The Data Deluge
Here we are in pole position!
What do I mean here? India boasts a vast market, with approximately 1.5 million individuals. It's safe to say that 2/3rds of them are now digital, on the lower end of the spectrum. The data of 1 billion people presents an enormous opportunity for companies to train their models on.
A lot of this data is locked in Telco's (telecommunication companies), social media, and a ton of the other mobility, online shopping, and social commerce apps that people use. Not a lot of data here is used for training.
In fact, for India, we have a data deluge. Across the length and breadth of the country, even as contours change, so do languages and local preferences in cultures and customs. This reflects in the buying patterns and spending habits of individuals, which differ to some extent with the consumer class in the west.
Indians, for instance, buy a lot of gold before festivals. Westerners buy gold as an investment only when they feel it would give them favorable returns, and certainly not prior to any religious festival.
Models trained on Indian datasets will account for all these regional and local modalities.
When Nandan Nilekani said we could become the AI use case capital of the world, he meant this. The demographic diversity at our disposal will yield dividends in this whole process.
I am of the opinion that we should do both. Given that India's corporate leaders are willing to invest billions in infrastructure to train large language models, it would be foolish to focus solely on fine-tuning existing models.
On the other hand, with a population of 1 billion and multiple languages, the volume of training data is sufficient to develop models that can accommodate much larger use cases than are currently possible.
It looks like things are just getting started.
Enterprise Solutions - Data & AI
1moVery informative