Serverless AI infrastructure
Screenshot of cloud-gpus.com

Serverless AI infrastructure

AI is eating the world but most cloud providers charge for AI infrastructure in the same way as the cloud has been charging for 15 years, i.e. per hour/second access to a server. If you ever ran AI training on a multi-GPU server, you might have seen that some workloads are slower if you spread them over multiple GPUs than if they run on one. This happens especially if training steps are short, e.g. the data scientists are still building the model. Also unless you know how to configure Tensorflow or Pytorch to use all GPUs, your training might just run on one of them. Add that many data scientists are not DevOps experts. Setting up and running software on a multi-GPU server can take some time to get right. 

So if you are charging customers by the hour/second and give them a 4 GPU server with Tensorflow, Pandas, Pytorch, CUDA,... installed, then they are unlikely to fully utilise that server for most of the time.

Let’s take an imaginary GenAI video transcribing startup which has trained a model to transcribe videos, e.g. think meeting minutes. They spend some hours setting up a server. Start fine-tuning an existing 10GB model which in the middle of the night finishes but they only see that when they arrive back the next morning. They now need to transfer this 10GB model to their production inference server where it can start servicing customers. In this whole process, they have probably spent more than 20-50% of the time, not using any GPUs and unless they had everything fine tuned 100% not using all GPUs. This is a massive waste of very expensive multi-GPU servers. High-end GPUs cost over $30K each.

For the GPU hoster they have another set of major challenges. GPU deliveries are not streamlined and in abundance. Today you might receive a load but for the next few weeks you might not be able to get any, even if you ordered them. So that means that today you might have an oversupply and the other days you might have an undersupply. That GenAI startup that paid for a server but left it unused 30% and 100% underutilised is a big lost opportunity. You could have sold that spare GPU compute to others. Also with pages like Cloud-GPUs.com [see the post's main image], pricing is continuously advertised. However new customers in the current model cannot easily move their workloads between GPU hosters.

For one of my customers, I am designing new types of solutions and I wanted to invite both AI users and GPU hosters to provide feedback. First of all by packaging AI workloads inside either Docker containers, or even better WASM containers, they can be made portable. This means that you can move them from one hoster to the next. By using IPFS, you can easily have models, training data and requests being brought to the workload. If the Docker instances, AI models or data are large, e.g. multiple gigabytes, and will be needed over multiple weeks then you could pay hosters for upfront caching, i.e. IPFS pinning. This would allow for fast startup times.

So that same imaginary GenAI startup can have its transcription model and Docker/WASM containers cashed in multiple clouds. When a video needs to be transcribed, they would ask multiple GPU hosters, who already have the model and container cached, what their pricing is at the moment based on if one or multiple GPUs in the machine are needed. The lowest priced or highest reputable below a certain price wins and gets to do the transcription. The startup would pay for storing the model and containers at several hosters [i.e. storage is cheap] and only pay for the effective GPU workloads. Like any serverless system, the startup does not have to worry about capacity planning because they no longer are looking after an individual set of boxes. The startup pays for what they use and nothing more.

From the hosters perspective, they can have multiple single GPU tasks run on one multi-GPU server, optimising sales. Pricing can be dynamically changed based on extra capacity being available or not. The cost of sales is a lot lower because you do not have to convince users to change to your cloud. One integration with a globally distributed serverless AI market and they can start earning.

If this sounds interesting, we are offering a limited number of hosters the option to collaborate into launching with us. Everybody that comes on board now will have a 3 months period of exclusivity before new hosters will be allowed to join. Be quick to reach out if you don’t want to be waiting for a potential competitor to outrun you… 

Anders Carlius

Driven ledare med passion för att kombinera affär, teknik och teamarbete för att uppnå komplexa mål.

11mo

This was well thought through.

Pierre de Conihout

ADHD tech founder with multi-exits | Cooking something new in AI 🎶 | Always available to help entrepreneurs 🧑💻

11mo

To view or add a comment, sign in

More articles by Maarten Ectors

Insights from the community

Others also viewed

Explore topics