Telco EdgeWash 2.0: AI-RAN inference edition

Telco EdgeWash 2.0: AI-RAN inference edition

We're entering a new era of telco #edgecomputing hype. It is essentially a replay of the MEC debacle from 5-6 years ago, but this time updated for the AI era, with added GPUs. I've included some unedited charts I created during 2018-19 into this article for reference.

The misguided idea is that deploying #AI compute resources in the radio network (AI-RAN) yields a large opportunity for mobile operators to host inferencing workloads at the edge, for third party applications and content providers.

We've been here before. Telcos have demonstrated only a minimal role in edge computing services, either as localised low-latency cloud computing suppliers, or even in terms of just offering colocation space in exchanges, or mobile towers / aggregation sites.

Nothing really changes with AI workloads that means in-network servers and compute magically becomes a huge opportunity. If anything, it's even harder.

You may have read my debates on Alok Tripathi's numerous, repeated threads about this already, plus some input on James Crawshaw's posts, but it's worth collating my thoughts and criticisms into a single post of my own.

First, some starting points:

  • Yes, AI is going to be very important for mobile (and fixed) operators. There are AI and GenAI use-cases for both centralised and distributed functions, in the core, the access network, IT / OSS / BSS and also end devices and CPE such as home gateways. I'm expecting #6G to be AI-native, probably including some deep integration into the radio as well.
  • Yes, we're going to see AI inferencing workloads in the public (AI-optimised) cloud, on devices, and in private infrastructures like Apple PCC.
  • Telco-delivered edge computing (especially in mobile) has largely been a failure to date, as I expected. The main applications that do run at the network edge are the telcos' own network functions such as #5G UPFs or [fixed] BNGs, plus CDNs and some security functions. There's very little use of these resources for customer / enterprise workloads.
  • Even where operators have partnered with hyperscalers on edge, in-network platforms such as AWS haven't had a huge impact on the marketplace. For instance, AWS Wavelength has 30 locations on 6 MNO networks around the world and hasn't been adding many recently AFAIK.
  • Private networks are mostly "edge" by definition, but usually non-telco
  • Delivering low latency for applications needs much more than a server at the nearest cell-site.
  • The majority of overall compute for AI, whether training or inference, will be concentrated in large datacentres, or in devices. The bit "in the middle" has less power supply, in particular. I reckon that both weights at the end of the "dumb-bell" collectively have maybe 100-150GW of energy supply, and the bar in the middle has maybe 10-20GW across all networks and smaller datacentre sites.

The compute power dumb-bell, from 2018

So what's new?

The central new trend is that telcos are considering deploying AI-optimised compute (think GPUs + NPUs combined with CPUs) both at centralised sites and also deeper in the network.

The central datacentres will be used for in-house AI training and inferencing, perhaps for local-language LLMs, but also for OSS tasks such as network planning and analytics, BSS functions such as customer support and marketing, various maintenance and security capabilities and so on. These are applications which are mostly not latency-sensitive, and typically don't need massive amounts of data and energy supply. Operators don't envisage their own nuclear reactors for power, unlike some of the larger hyperscale facilities being planned.

They may also be used as "AI factories" in markets where the operator is the leading #supercomputing provider, perhaps working with government on #SovereignAI. In other markets, operators are buyers of GPU-aaS, not sellers. They may have some inhouse resource, but will then rely on cloud #hyperscalers, new AI cloud specialists like CoreWeave, or academic or other supercomputing centres.

AI-RAN sounds good but expensive. "Monetising RAN AI" is unlikely

As well as centralised facilities, distributed compute is mostly being aimed at acceleration and AI functions for the radio network. For mobile, that means things like Open RAN beamforming, radio resource management and so on. Potentially it can also be used for spectrum-sensing, as I wrote recently in another LinkedIn post. The same edge servers might also handle UPFs and core / security functions.

All fair enough.

This is where I expect to see a lot of work in making 6G more AI-enabled in future. There's a whole slew of potential cleverness that can be added to the radio, whether that's relating to future MIMO variants and metasurfaces, channel estimation, energy optimisation and so on. Early examples will appear in 5G Releases 18/19/20.

But where I draw the line is the notion that (expensive) spare capacity can be sold off easily by the MNO, allowing the AI-RAN to be monetised for "edge inferencing as a service". Given the costs of the GPUs, I can understand why this appeals in theory, but it has all the same practical limits as MEC, plus some new ones.

The challenges for edge inferencing-aaS

It's worth revisiting the various reasons that existing MEC propositions haven't gained traction for telcos selling edge-cloud services to enterprises or application developers. Most have a direct read-across to AI-RAN.

  • Developers don't really want their workloads run, or sensitive data stored, in unknown locations. Telling an automotive company or an finance firm that their AI applications might be running in different cell-sites and servers, at different times, is unlikely to thrill their security and compliance teams. Would a telco be happy to do their inferencing on a manufacturer's or bank's spare GPUs?
  • No developer wants to have to do separate AI inferencing-aaS deals with multiple operators in one market, or 100s around the world. There would need to be a unified aggregation provider and API, and a common pricing approach for edge-AI on any MNO network - whatever their particular structure of GPU / CPU etc deployment. So far CAMARA et al haven't got very far on commercialising ordinary edge APIs, let alone AI ones.
  • Interconnection is essential for most cloud-based apps. AI workloads will be no different. They need fast, secure paths to all the other networks, the public Internet, all the hyperscalers and security platforms, relevant peering and transit options and so on. Mobile networks generally lack a dense grid of interconnect points - usually there are gateways the other side of the core. Worse, some of the telcos are playing silly games with interconnect and transit, or even trying to get it regulated in the EU. Nobody is going to want to strand their AI workloads inside a walled garden with rip-off entry / exit fees (and yes, that applies to hyperscalers' cloud fees too).
  • RAN engineers don't really want third party apps running in their infrastructure, no matter how good the orchestration and isolation claims to be. They certainly don't want the marketing department telling them that no, they can't add more RAN resource or deploy new xApps to the RIC, because they've sold the capacity to a games company or a hospital.
  • Various AI inferencing applications are likely to need their own hardware, rather than relying on general-purpose platforms. This is not just about choosing GPUs / CPUs, but also the security architecture. Apple's AI has a full proprietary security stack (and encryption) going all the way down, plus its own hardware and chips.
  • For AI running on mobile devices (especially smartphones), they will often be connecting via Wi-Fi / fibre, or perhaps some sort of shared network like a neutral host. Does this mean that the MNO network can only do 20-30% of the offloaded inferencing? Where does the rest of it go?
  • Some of the MEC models implicitly require 5G standalone cores and devices to be widespread. So far, that's uncommon, and is changing only slowly. Maybe it gets easier in the AI era, but there are still questionmarks about things like indoor use and whether that changes access to an edge node.
  • As for colocation - if central offices or mobile sites have been found unsuitable for hosting server racks taking 10kW in the past, it's hard to imagine it will get any easier for racks of GPUs needing 100kW in future. As well as power supply, they also need great physical security (but still easy access for engineers), diverse fibre paths with multiple backhaul/transport networks, and lots of cooling
  • Telcos are very keen to reduce their overall energy footprint, especially where they do not have limitless access to clean/renewable power sources. While they could probably lean on detailed calculations of Scope 3 emissions to argue that this was all on behalf of customers, their ESG teams would have to be very clear about this in any reporting. It's hard to see many national authorities being too happy about this, especially in Europe.

A key point I made in the past: network edge services can be of three main types:

  • Single-operator telco edge
  • Federated multi-operator edge (eg via CAMARA / Open Gateway aggregation & APIs)
  • Interconnected edge (with local breakout and peering, and direct Internet / cloud connectivity)

It's entirely unclear what types of application are ideal for the single-network bubble in the chart, except ones specifically created for / deployed by an individual operator. Maybe there are options if a specific MNO does a national-language LLM for its subscribers and partners, that could fly, or perhaps if it has a dedicated vertical team working in a specific sector.

Three edge computing models for connectivity providers (from 2019)

What can be done, realistically?

I'm very skeptical that there is a major market available for MNOs in edge inferencing. In some cases, I can see certain operators as sellers of GPU-aaS from central computing sites, if there's a supply/demand imbalance in a given country, or a desire for "sovereign AI" capability for some applications or verticals.

But edge AI, especially based on AI-RAN? Why is AI inferencing any different to all the other supposed edge workloads, that have failed to drive a market for in-network edge computing for the past 5-6 years?

The telecom industry claimed that MEC servers would be used for enterprise apps, XR rendering offload, smart cities, telemedicine, vehicles etc. Almost none of that has ended up in telco servers. Where they’ve used edge compute, it’s been either new mini data centres (eg shipping container size) or local metro / regional DCs.

It's not clear that the move to AI-based applications will change the central vs. edge demand pattern for compute. Even where more edge-side processing is needed, there's a lot of work ongoing on AI PCs, smartphones, home gateways, cars and so on.

Maybe there's a higher-level battle ongoing between NVIDIA and Qualcomm here, with the latter pushing for more device-centric compute, while NVIDIA is trying to get telcos to buy into their messaging and ecosystem.

So what could move the needle here? Here are some suggestions that could help operators still interested in edge computing, whether that's for AI or other purposes.

1) As I've also written about recently, telcos need to eat their own dogfood. Let's see the AI-RAN inferencing resource being used by operators for their own new applications - perhaps offloading IoT data processing such as managed security cameras, or being used for local-language telemedicine or retail or smart home GenAI. If their own product teams won't use the spare capacity, then why would anyone else? They could even ask their finance and treasury teams if they'd like to run some Bitcoin miners in the RAN, if there's all that spare compute and power available and unused.

2) The next category of customer should be other service providers. They already have wholesale relationships such as MVNOs or roaming. Surely those CSPs would want to take advantage of outsourced inferencing? They could just ask their account manager and have it added to their wholesale charging and billing, no?

3) MNOs need to sort out local interconnect and peering for all sorts of reasons, but it's an absolute pre-requisite for AI applications. It counts as "necessary but not sufficient", but is something they should have done years ago. This means working with Internet, cloud and CDN providers more closely, as well as exchange operators like DE-CIX and London Internet Exchange (LINX) . They will also need to have long chats with their regulators about this, and also tell their internal IP transit subsidiaries to get onboard, or shut up. It will probably need awkward conversations with the GSMA / IPX folk as well.

4) Work closely with the hyperscalers and some of the new AI cloud specialists like Coreweave. While the initial edge forays with AWS, GCP and Azure haven't amounted to much, maybe they have more ideas about managing a distributed compute marketplace now. However, MNOs should expect to share any in-network AI-edge-aaS marketplaces [I wish that had a funny acronym] with their fixed broadband peers, who have considerably more opportunity, I expect, as well as various dedicated colocation and edge datacentre owners.

5) Show willingness to be a buyer, as well as seller. If edge AI inferencing is so important, I'd expect operators to be customers as well as putative suppliers. If you're not checking out who else has spare "edge AI" capable GPUs in your HQ city, maybe you should? After all, you're supposedly in competition with them.

But above all, drop the #EdgeWash hype. We've heard it before.

As I said years ago, 5G needs edge, much more than edge needs 5G. (And the slide below also mentioned GPUs, back in 2019).

Edge AI mostly takes us back to the same edge applications expected in 2019


Jim Carey

Software Networking and Automation Leader

3w

Interesting article. Am I getting too cheeky at summarizing it as "if-you-build-it-they-will-come is not a business model?" I'm a believer that edge applications will come (as a set of app dev requirements are satisfied) but like MEC, I don't see them happening if we just put some compute power in the corner and start bragging. We need clear applications driving actual requirements with business-intent on front instead of the back.

Like
Reply
Saurabh Verma

Technologist | Consultant | Industry 4.0/5.0 Beyond Connectivity with AI/ML | Cloud Native/Infra | ICT Wireless Solutions (4G/5G/WiFi / VNF/CNF/PNF) -- change is eventual, always strive for best

4w

informative Dean Bubley as your other article use to be. but very intriguing to me when your refer Edge at RAN with 6 year old MEC or MAEC ( I guess ) air balloon. I agree edge computing could not take shape earlier but no doubt its time has come, for sake of Application demand of low latency and real time inferences in AI context too. But I feel in kind of conundrum, when you mix this with AI-RAN and talk of Edge/GPU there for sake of RAN specific inferences, are we talking same thing there altogether.

Like
Reply

I appreciate the insights, Dean! Edge computing is not a hype, there are clear use cases to justify wide deployment.

Like
Reply

Fully agree. Edge infra is probably hyperscalers play…surely not at cell site, but at a metro /regional data centres. Network can only help with QoD, routing optimisations etc.

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics