Streamline GPU usage with Advanced Slicing
With AI/ML use cases increasing, GPUs are in huge demand. Everyone is talking about AI/ML. I was reading a funny joke that says, if you do a single pushup every time you hear the term AI/ML you can make a nice body without going to GYM. Jokes apart the reality is that everyone is looking at the AI/ML, especially GenAI use cases. Many organisations are including GPU-based servers in their environment. Typically these GPUs are hosted in computer servers like Cisco UCS. You will install hypervisors and virtual machines on top of these bare metal servers. Different VM workloads will have different needs and may not require the entire GPU for itself. That is where the technique of GPU slicing comes into the picture. it allows you to partition a single GPU into multiple GPUs, allowing multiple VMs to access the GPU. Let's dig deeper into this concept.
GPU slicing is a technique that allows multiple virtual machines (VMs) or containers to share a single physical GPU (Graphics Processing Unit). In simple words, GPU slicing means dividing the GPU resources into smaller, isolated units that can be allocated to different tasks or users at the same time.
This is very useful in cloud computing environments, where multiple users or applications need to use GPU power but cannot each have a dedicated GPU. It works by using software to create virtual GPUs (vGPUs) from a single physical GPU. Each vGPU acts like a separate GPU but is actually a part of the physical GPU. This is done using special software like NVIDIA vGPU or AMD MxGPU. The software divides the GPU memory and processing power into smaller chunks, and each chunk is assigned to a different VM or container. This way, many users can run their applications on the same GPU without interfering with each other.
Lets look at the GPU slicing modes offered by NVIDIA
NVIDIA offers several specific GPU slicing modes designed to optimize resource allocation and performance for various use cases. These modes are mainly provided through NVIDIA's vGPU (virtual GPU) technology.
Recommended by LinkedIn
If you are looking to use vGPU for your workloads. , make sure that
You will then need to install the vGPU software from the GPU vendor on the hypervisor host and also vGPU drivers on the Guest VM.
In summary, GPUs are pretty expensive resources and you might to utilise them most efficiently. Plan the vGPU method based on your use-case and need. You can always optimise the vGPU assignment using the vGPU profiles.