🔮 Will genAI cause a compute crunch?

🔮 Will genAI cause a compute crunch?

Last year, Google reached a milestone where its spending on compute exceeded its spending on people. This is a watershed moment.

For millennia, until the 1600s, human and animal sweat was the dominant form of work. It was human muscle power that was, in Vaclav Smil’s terms, the “prime mover” in our economies. Today machines have largely taken over this role.

Of course, Google’s cross-over point doesn’t mean human mental efforts have been supplanted by machines. But it does signify the shape the future could take: increasingly high spends on software that relies on computing power. The sweat is now on the computers’ brow.

This is a fundamental shift. And some companies understand it far better than others. If you ask Sam Altman or Satya Nadella or Sundar Pichai what they would do with a thousand or a million more times compute, they will know the answer. That scale is a strategic resource for them. But to what extent is that true for the bosses of the large firms that comprise the bulk of the economy? To what extent is that true for you and your organisations?

Several months ago, I took this question to my friend François Candelon , at the time Global Director of the BCG Henderson Institute (now a partner at the private equity firm Seven2). We agreed that we were both hearing concerns from senior leaders about whether compute would remain affordable, especially in the face of growing demands from AI. That led us to launch this piece of research.

So for the past few months, a team from the BCG Henderson Institute ( Riccarda Joas and David Zuluaga Martínez ) and Exponential View’s Nathan Warren have been working on this question…To what extent will the boom in generative AI impact the availability of affordable computing power?

We built a model with bullish global demand projections and realistic supply constraints. What we found is that the fears are not grounded in reality. Instead of being anxious about a lack of computing power, executives should be gearing up for an abundance of compute. Like Sundar, Sam and Satya, they should all have a clear-sighted view of how their business will change as computation becomes much more widely available.

In today’s Part 1 we will make the argument for why there will be enough compute. Next week, in Part 2, we will propose a framework for how to leverage abundant computing power.

What follows in today’s email is an excerpt of the paper for readers of Exponential View. You can access the full paper here.

Many thanks to François Candelon, Riccarda Joas and David Zuluaga Martínez for collaborating on this research with us!


Introduction

In the race for AI supremacy, computing power has become the new arms race. As generative AI models balloon in complexity, demand for specialised hardware has outstripped Moore’s Law, threatening the digital economy’s foundations. Tech titans are plotting grand schemes: Microsoft and OpenAI’s reported $100bn supercomputer project leads the pack—if realised. Not to be outdone, Elon Musk unveiled Colossus, boasting 100,000 Nvidia processors (with plans to double), while Oracle flexed its muscle with a zettascale cluster sporting 131,072 Blackwell GPUs. Google’s 2023 pivot to prioritise computing infrastructure over personnel spending underscores this compute-intensive AI era. The question remains: can supply keep pace with insatiable demand?

We argue that a nuanced understanding of genAI’s computational demands—specifically, the distinctions between model training and model inference—reveals a less dire outlook. Even under aggressive assumptions about genAI’s growth and compute intensity, our quantitative model indicates that genAI workloads will account for only about 34% of global data centre-based AI computing supply by 2028. Thus, the rise of genAI is unlikely to disrupt the long-standing regime of affordable and widely available computing power.

While other factors, such as the energy required to power data centres, could pose significant constraints, genAI’s computational demands alone are unlikely to outpace the world’s capacity to produce the necessary hardware.

The historical development of computing power

For the past five decades, concerns about computing power supply have been minimal, thanks to two synergistic factors: Moore’s Law and large-scale digitisation. Since around 1970, the number of transistors per chip has approximately doubled every two years, allowing for exponentially more computations per chip. Simultaneously, computing hardware has proliferated globally through data centres, personal computers, smartphones, and a myriad of devices, resulting in an estimated 60% compound annual growth rate in total computing supply since the 1970s.

This abundance has fostered an environment where computing power is both affordable and readily accessible. The prevalence of inefficient or “bad code,” which a 2018 Stripe report estimated costs companies $85 billion annually, suggests that businesses have historically not faced significant computing supply constraints. Much like oil in the early 20th century, computing power has been valuable yet sufficiently plentiful to permit a degree of inefficiency without dire consequences.

Understanding genAI’s computational demands

The central concern is whether genAI will disrupt this equilibrium of ample, affordable computing power. Addressing this requires dissecting the different computational needs associated with genAI, which can be broadly categorised into three types: model training, fine-tuning, and inference.

  1. Model Training: Developing a foundational genAI model involves large-scale training that is both resource-intensive and costly. This process necessitates specialised hardware and is typically undertaken by a select few—mainly hyperscalers like Google and Microsoft or specialised firms closely allied with tech giants, such as OpenAI or Anthropic. Given the immense resources required, most businesses are unlikely to engage in foundational model training.
  2. Fine-Tuning: This involves adjusting a pre-trained model to perform specific tasks or adapt to particular datasets. Fine-tuning is significantly less computationally demanding than foundational training, often requiring less than 10%—and sometimes as little as 0.1%—of the resources needed for initial training. Additionally, once a model is fine-tuned, it only occasionally requires retraining.
  3. Inference: This refers to the practical application or “use” of the genAI model, such as generating text or making predictions based on input data. Inference typically accounts for the majority of a model’s total computational demand over its lifecycle, especially as a single model serves numerous users over time. Crucially, inference does not require the highly specialised hardware necessary for training large models. It can be performed on less specialised chips and can be distributed across multiple data centres or even edge devices like laptops and smartphones.

Understanding that inference—the most compute-intensive phase in terms of aggregate usage—relies less on specialised hardware suggests that fears of an impending computing power scarcity may be overstated. The pivotal question becomes: How extensive will genAI model inference demand be, and how computationally intensive will it become?

Modelling a bullish scenario for genAI inference demand

To explore this question, we constructed a quantitative model projecting a bullish scenario for genAI inference demand against a moderate supply forecast up to 2028. Our model aggregates demand from businesses, governments, and individual consumers, focusing on workloads necessitating AI chips in data centres. Supply is estimated based on the availability of hardware that can handle AI workloads (e.g. GPUs), while acknowledging that some inference can run on less specialised hardware but opting for a conservative approach.

Our findings indicate that even under aggressive assumptions about the growth and intensity of genAI demand, the aggregate global demand for genAI model inference will reach only about 34% of the total available data centre computing power for AI by 2028. This suggests that the rise of genAI is unlikely to outstrip the global capacity for producing the necessary computing hardware.

Key demand assumptions

  1. Continuation of Model Scaling: We assume that frontier genAI models will continue to grow in size, with parameter counts increasing at a rate similar to the historical 2.8x per year observed since 2018. By 2028, models could engage up to 15 trillion parameters per prompt—over 50 times more computationally intensive than the average GPT-4 inference today. However, recognising that not all users will require or adopt the latest models immediately, we assume an average of 5 trillion parameters per prompt by 2028 for general use and smaller models for agentic workflows.

  1. Aggressive Global Adoption Rates: We project a “double exponential” adoption pattern, both in the number of entities using genAI and the intensity of its use. By 2028, we assume that 20% of small and medium-sized enterprises (SMEs), 30% of large businesses, and 20% of governmental institutions worldwide will substantially use genAI. These figures are ambitious, especially when compared to the 2018 data showing much lower AI adoption rates among U.S. businesses. We posit that genAI’s natural language interface may facilitate faster adoption than previous AI technologies.

For digital advertising—a significant potential driver of genAI demand—we independently model a scenario where all ads on Meta’s platforms by 2028 incorporate genAI-powered, personalised images and captions.

  1. Exponential Growth in Utilisation: We assume high and rapidly increasing utilisation rates. For employees in businesses and governments, we start with an aggressive estimate of 10,000 tokens per employee per day, equivalent to several interactions with a genAI interface. For agentic workflows, which automate end-to-end tasks, we project initial requirements of about 500,000 tokens per workflow per day in 2025, escalating to around 2 million tokens by 2028.

We further assume that the intensity of genAI use will grow at a rate comparable to historical mobile data traffic, approximately 60% annually. For individual consumers, we estimate their inference demand at about 15% of total business demand, aligning with ratios observed in services like Microsoft Office 365.

Most of the demand is driven by agentic workflows, which are at the upper end of what is feasible with today’s technology. The crucial point is that even with our very aggressive assumptions, there will be sufficient supply, and genAI inference—reaching ~2e30 FLOPs by 2028—would only take up ~34% of likely global supply. 

Supply projections

On the supply side, we adopt moderate growth projections to test the robustness of our conclusions. We estimate the current baseline computing power available for genAI inference based on the quantity of state-of-the-art GPUs in use. Nvidia, a leading GPU manufacturer, shipped approximately 3.8 million data centre GPUs in 2023—a 42% increase from 2022. Accounting for average utilisation rates and technological capabilities, we calculate a baseline supply of roughly ~7e28 FLOPs for 2023.

Looking ahead, we reference industry analyses suggesting that AI computing power will increase by about 60 times by the end of 2025 compared to early 2023 levels. We anticipate this growth rate will moderate to around 60% annually through 2028, resulting in a supply of approximately ~4e30 FLOPs —about 57 times the 2023 figure.

It’s important to note that supply could exceed these estimates due to intensified competition and new market entrants. Companies like AMD have introduced new chips designed for genAI workloads, and hyperscalers are developing specialised chips optimised for inference, such as Google’s TPUs and Microsoft’s Maia 100 chip. This diversification could enhance the total computing supply beyond our moderate projections.

Breaking points beyond computing hardware

Our analysis leads to the conclusion that even with exponential growth in genAI’s computational demands and aggressive adoption rates, the established regime of affordable and widely available computing power is unlikely to collapse due to genAI alone. However, several factors could potentially disrupt this equilibrium:

  1. Explosion in Consumer Demand for Inference: A scenario where consumers massively adopt compute-intensive applications—such as generating high-resolution videos using advanced genAI models—could strain computing resources. For instance, if the millions of videos uploaded daily on platforms like TikTok were generated using resource-heavy models, it could significantly escalate demand. However, we consider this unlikely, as increased inference costs would likely curb consumer usage due to price sensitivity.
  2. Supply Chain Disruptions: The hardware supply chain for computing is complex and geographically dispersed, with critical dependencies on specific regions and companies. Geopolitical tensions, trade restrictions, or sanctions could disrupt the production and distribution of essential components, impacting the overall supply of computing power.
  3. Energy Constraints: Perhaps the most pressing concern is the energy required to power data centers, especially as computational demands escalate. Our model estimates that powering the GPUs for genAI inference would require about 40 terawatt-hours (TWh) per year initially—between 10% to 13% of current global data centre energy consumption. As agentic workflows become more prevalent, energy requirements could become a significant constraint unless substantial gains in energy efficiency are realised.

Encouragingly, advancements in energy-efficient hardware, such as specialised chips designed for inference with lower power consumption, are emerging. Data centres are also adopting innovative cooling systems and leveraging AI to optimise energy use. Additionally, on-site energy generation, including renewable sources, could alleviate pressure on energy grids.

Conclusion

Our exploration suggests that genAI’s rise will not, by itself, outpace the global capacity to produce the required computing hardware. The longstanding regime of affordable and accessible computing power is poised to continue, even under the most aggressive genAI demand scenarios. Therefore, businesses should not prepare for a scarcity of computing resources but should instead focus on leveraging the anticipated abundance of computing power to gain competitive advantages.

Read the full white paper

Denis O.

Peak Post-Truth Era Human | FinTech Professional | Definitely Not an AI Expert

2mo

Databricks VP of AI Rao says only 10% of GenAI projects make it past the lab. So what are they going to compute with all this strategic resource? Accuracy and liability have been cited as a issue and a major impediment in a recent WSJ article. No clear path to ROI besides the sad lonely peole paying for an LLM "companion." Seriously this is a house of cards and sooner or later it will crumble leaving us with ecological catastrophe, trillions in losses and very few actual use cases for GenAI.

Like
Reply
Frank Dias

Comms Lead, AI @AdeccoGroup | IC+AI Chief Explorer | AI Educator | AI Filter | ➡️ Internal Comms Folk⭐

2mo

A more nuanced question is around, is all of this worth it and at what cost?

Like
Reply
Felipe Chavarro

Tech Ethicist | Prolific Author | @DemystifyTech Founder | Responsible AI

2mo

Fascinating analysis, Azeem Azhar. It's reassuring to see that even under aggressive assumptions, the compute supply is likely to meet genAI demand. A few thoughts: The distinction between training, fine-tuning, and inference is crucial. It's good to see this nuanced approach in the analysis. Your point about energy constraints being a more pressing concern than hardware availability is intriguing. How do you see the development of more energy-efficient AI chips and cooling systems playing out? The potential for consumer-driven demand explosion is an interesting wild card. Do you foresee any particular applications that might trigger this? Given your conclusion that compute scarcity isn't likely, what do you think will be the key differentiators for companies in leveraging AI? Will it be more about talent, data, or novel applications? How might geopolitical factors affect this outlook, particularly regarding supply chain disruptions you mentioned? Your analysis provides a valuable perspective for business leaders planning their AI strategies. It suggests the focus should be on innovative applications and efficiency rather than worrying about compute availability. Looking forward to Part 2!

🌀 Ravi Gupta

Business Development Leader | Driving Sales Growth & Client Engagement | SaaS & B2B Expert

2mo

Really interesting take, Azeem! It’s good to know that despite all the buzz around AI, there’s enough compute power to meet the demand. The breakdown between training and inference really puts things into perspective. Looking forward to the next part and how businesses can better leverage this abundance of compute!

To view or add a comment, sign in

More articles by Azeem Azhar

  • 🔮 Why AI, solar & batteries will keep getting cheaper

    🔮 Why AI, solar & batteries will keep getting cheaper

    My friend Michael Liebreich invited me to his podcast Cleaning Up to discuss the intersection of exponential…

    12 Comments
  • 🔮 What surprised me most after 500 editions of Exponential View

    🔮 What surprised me most after 500 editions of Exponential View

    Artwork by Moebius After nine years of writing Exponential View and 500 Sunday editions at technology’s frontier, I’ve…

    6 Comments
  • 🔮 Ten charts to understand the Exponential Age

    🔮 Ten charts to understand the Exponential Age

    This week marks the 500th edition of the Sunday newsletter. My aim all along has been to show that we live in…

    10 Comments
  • 🔮 The chip advantage

    🔮 The chip advantage

    This is an excerpt from my weekly newsletter, Exponential View. All new paying subscribers to Exponential View get 1…

    3 Comments
  • 🪄 My first, magical Waymo ride

    🪄 My first, magical Waymo ride

    After changing my view of self-driving cars by using my head and thinking through the data, I can confirm that my heart…

    3 Comments
  • 🔮 What would you do with an abundance of computing power?

    🔮 What would you do with an abundance of computing power?

    What would you do with 1000x more computing power? How would your organisation use it? If you were to ask these…

    7 Comments
  • 💡 The foundations of future AI

    💡 The foundations of future AI

    ChatGPT, Claude and other language models have dominated mainstream discussions and use. It’s not surprising: they’re…

    4 Comments
  • 🔮 AI, energy & industry round-up for September

    🔮 AI, energy & industry round-up for September

    Welcome to my September recap on AI, climate and energy transition, industry and economic trends. This summarises the…

    5 Comments
  • Fastest tech in history

    Fastest tech in history

    ❤️ THANK YOU for reading Exponential View. If you upgrade your membership today, you’ll get 1 year of FREE access to…

    8 Comments
  • 👀 What is going on at OpenAI?

    👀 What is going on at OpenAI?

    This was originally published earlier today in my newsletter Exponential View. If you become a paying member of…

    7 Comments

Insights from the community

Others also viewed

Explore topics