The rise of local AI should democratize this technology
Hyper-local and network-free models are on their way that will deliver AI advantages to millions
Groundbreaking announcements on Generative Artificial Intelligence (GenAI) have been making news. For instance, Google and OpenAI introduced new GenAI-powered assistants that can engage in real-time conversations, even adapting when interrupted, mirroring human interaction. These assistants don’t just converse; they can also analyse your surroundings through a live video feed and translate conversations on the spot.
Google announced new tools at its I/O Conference, including enhancements to its bewildering array of products under Gemini AI, to compete with OpenAI’s new ChatGPT 4o announced the day before. Google also announced that it is building Gemini into a “do everything” model across almost all its product suites. For its part, OpenAI’s conversational ChatGPT 4o model can supposedly respond with a lag time of 320 milliseconds, which is about the same as human speech. With humour, sarcasm and more, its responses are remarkably human-like.
However, the immense computational power and energy required to train and deploy these large language model (LLM)- based systems raises concerns about their sustainability. LLMs are designed to understand and generate human language by processing vast amounts of data, literally every scrap available online. They leverage specific learning techniques, such as transformer architectures, to create sophisticated models that can perform various language-related tasks. The most prominent examples, such as OpenAI’s and Google’s, have demonstrated remarkable proficiency in functions like text and image generation, summarization and now conversational AI.
The primary advantage of LLMs lies in their ability to generate coherent and contextually relevant text. This is achieved by training on diverse and enormous data-sets. They are helpful, but the downside is the massive computational resources required for their training and inference engines. These models typically run on powerful cloud servers with high-performance silicon, consuming substantial energy and generating a significant carbon footprint. Also, the need for constant internet connectivity to access these models can be a limitation in scenarios where privacy, security or connectivity is a concern; most of them are natively available only in a ‘cloud’ environment.
To compete with LLMs, researchers and companies (notably Microsoft with its Phi model) are exploring ways to bring AI capabilities to devices like laptops and smartphones. In contrast to the cloud-dependent nature of LLMs, local AI models aim to deliver AI capabilities directly on smaller devices. These models are lightweight, consuming far less computational power and energy, which makes them more eco-friendly and reduces operational costs.
By processing data locally on a device, these models minimize the need to transmit sensitive information over the internet, enhancing user privacy and data security. Further, since local AI models can operate without constant internet connectivity, they are accessible in remote or underserved areas. They offer faster response times, as they don’t require a network. Local AI models are particularly well-suited for cutting-edge computing scenarios, where real-time processing and low latency are critical. Smart home devices, autonomous vehicles and industrial IoT applications can all benefit.
Recommended by LinkedIn
Led by researchers like Sebastien Bubeck, the Phi model is designed to give small devices generative AI capabilities without compromising performance. The core idea behind Phi is to curate and optimize the data fed to the model, ensuring that it remains compact and efficient while retaining its ability to perform complex tasks. (bit.ly/3VxqyMf). Phi leverages advancements in model compression, quantization and distillation techniques to achieve its goals. Model compression reduces the size of the model without much affecting performance. Quantization reduces the precision of the model’s weights, making it less computationally intensive. Distillation involves training a smaller model to mimic the behaviour of a larger one, effectively ‘transferring’ knowledge from a larger model to a smaller one.
Yet, the transition from cloud-dependent LLMs to efficient local AI models is far from assured. Achieving high accuracy with a smaller model requires sophisticated techniques and innovations in model architecture and miniaturization. Additionally, local AI models must be highly adaptable and capable of running on various local hardware configurations without compromising performance. Despite these challenges, the potential benefits of local AI models are immense. By democratizing access to AI capabilities and reducing the environmental impact of AI deployments, models like Phi represent a significant step towards a more sustainable and inclusive future for artificial intelligence. The efforts to bring AI capabilities to small local devices mark a pivotal shift in the evolution of AI. A Wired reporter says he is running one of these models on his laptop and that the model has all the “wit and wisdom” of ChatGPT. (bit.ly/4c4JAi8)
As research in this field continues to advance, I expect we will see more sophisticated and capable local AI models emerge, enabling a new wave of applications and innovations. If AI becomes hyper-local and network-independent, it can transform how we interact with technology and the very role of this technology in society.
Siddharth Pai is co-founder of Siana Capital Management, a venture capital fund management firm. The views expressed are his own and don't reflect any other entity's.
This article first appeared in LiveMint online and in the print edition of Mint. For this and more, see: