Serverless AI infrastructure

Maarten Ectors

Innovative Technologist, Business Strategist and Senior Executive | Bridging Technology & Business for Lasting Impact

Published Jan 25, 2024

AI is eating the world but most cloud providers charge for AI infrastructure in the same way as the cloud has been charging for 15 years, i.e. per hour/second access to a server. If you ever ran AI training on a multi-GPU server, you might have seen that some workloads are slower if you spread them over multiple GPUs than if they run on one. This happens especially if training steps are short, e.g. the data scientists are still building the model. Also unless you know how to configure Tensorflow or Pytorch to use all GPUs, your training might just run on one of them. Add that many data scientists are not DevOps experts. Setting up and running software on a multi-GPU server can take some time to get right.

So if you are charging customers by the hour/second and give them a 4 GPU server with Tensorflow, Pandas, Pytorch, CUDA,... installed, then they are unlikely to fully utilise that server for most of the time.

Let’s take an imaginary GenAI video transcribing startup which has trained a model to transcribe videos, e.g. think meeting minutes. They spend some hours setting up a server. Start fine-tuning an existing 10GB model which in the middle of the night finishes but they only see that when they arrive back the next morning. They now need to transfer this 10GB model to their production inference server where it can start servicing customers. In this whole process, they have probably spent more than 20-50% of the time, not using any GPUs and unless they had everything fine tuned 100% not using all GPUs. This is a massive waste of very expensive multi-GPU servers. High-end GPUs cost over $30K each.

For the GPU hoster they have another set of major challenges. GPU deliveries are not streamlined and in abundance. Today you might receive a load but for the next few weeks you might not be able to get any, even if you ordered them. So that means that today you might have an oversupply and the other days you might have an undersupply. That GenAI startup that paid for a server but left it unused 30% and 100% underutilised is a big lost opportunity. You could have sold that spare GPU compute to others. Also with pages like Cloud-GPUs.com [see the post's main image], pricing is continuously advertised. However new customers in the current model cannot easily move their workloads between GPU hosters.

Recommended by LinkedIn

The Future of Artificial Intelligence, Cloud…

Bernard Marr 3 years ago

General availability of Inf2 instances made possible…

AWS Careers 1 year ago

Profit Dollars per GPU Dollar

Tomasz Tunguz 1 month ago

For one of my customers, I am designing new types of solutions and I wanted to invite both AI users and GPU hosters to provide feedback. First of all by packaging AI workloads inside either Docker containers, or even better WASM containers, they can be made portable. This means that you can move them from one hoster to the next. By using IPFS, you can easily have models, training data and requests being brought to the workload. If the Docker instances, AI models or data are large, e.g. multiple gigabytes, and will be needed over multiple weeks then you could pay hosters for upfront caching, i.e. IPFS pinning. This would allow for fast startup times.

So that same imaginary GenAI startup can have its transcription model and Docker/WASM containers cashed in multiple clouds. When a video needs to be transcribed, they would ask multiple GPU hosters, who already have the model and container cached, what their pricing is at the moment based on if one or multiple GPUs in the machine are needed. The lowest priced or highest reputable below a certain price wins and gets to do the transcription. The startup would pay for storing the model and containers at several hosters [i.e. storage is cheap] and only pay for the effective GPU workloads. Like any serverless system, the startup does not have to worry about capacity planning because they no longer are looking after an individual set of boxes. The startup pays for what they use and nothing more.

From the hosters perspective, they can have multiple single GPU tasks run on one multi-GPU server, optimising sales. Pricing can be dynamically changed based on extra capacity being available or not. The cost of sales is a lot lower because you do not have to convince users to change to your cloud. One integration with a globally distributed serverless AI market and they can start earning.

If this sounds interesting, we are offering a limited number of hosters the option to collaborate into launching with us. Everybody that comes on board now will have a 3 months period of exclusivity before new hosters will be allowed to join. Be quick to reach out if you don’t want to be waiting for a potential competitor to outrun you…

Constructive Innovation

5,531 followers

+ Subscribe

Anders Carlius

Driven ledare med passion för att kombinera affär, teknik och teamarbete för att uppnå komplexa mål.

11mo

This was well thought through.

1 Reaction

Pierre de Conihout

ADHD tech founder with multi-exits | Cooking something new in AI 🎶 | Always available to help entrepreneurs 🧑💻

11mo

Arthur RENAUD Paul Chaumeil

2 Reactions

See more comments

To view or add a comment, sign in

Serverless AI infrastructure

Maarten Ectors

Innovative Technologist, Business Strategist and Senior Executive | Bridging Technology & Business for Lasting Impact

Recommended by LinkedIn

Constructive Innovation

5,531 followers

More articles by Maarten Ectors

Insights from the community

Others also viewed

New Beta APIs. Ray On more versions and Cloud Run News.

Discovering the Future: How to Navigate the Dynamic World of Cloud Computing.

Cost Optimization Techniques for AI-Driven Microservices Architectures in Azure Cloud: A Deep Dive

Microsoft Ignite 2024: New Azure Data Center Chips Unveiled.

AWS Weekly News Roundup Issue #207

Big Cloud Embraces Serverless AI

WC 24/12/02 AWS Whats New

How Cloud GPU Servers are Empowering AI Projects?

TECHNOLOGY UPDATE - 9/4/2024

Explore topics

Recommended by LinkedIn

Constructive Innovation

5,531 followers

More articles by Maarten Ectors

Tesla, We Robot, a quick summary and what it means for you and society

Ten problems your next business can fix

What is asset tokenisation and why many industries will be disrupted?

“Finding the killer app?” should be “Making more millionaires?”

Ofcom: What if you could grow the telecom industry exponentially?

It is time to think beyond telcos

10 reasons to hire a fractional CTO or CPO?

When is AI easy and hard to apply?

Tonomus, the Neom infra-company is in trouble, unless….

How can enterprises lower their IT spend exponentially?

Insights from the community

Others also viewed

New Beta APIs. Ray On more versions and Cloud Run News.

Discovering the Future: How to Navigate the Dynamic World of Cloud Computing.

Cost Optimization Techniques for AI-Driven Microservices Architectures in Azure Cloud: A Deep Dive

Microsoft Ignite 2024: New Azure Data Center Chips Unveiled.

AWS Weekly News Roundup Issue #207

Big Cloud Embraces Serverless AI

WC 24/12/02 AWS Whats New

How Cloud GPU Servers are Empowering AI Projects?

TECHNOLOGY UPDATE - 9/4/2024

Explore topics