Streamline GPU usage with Advanced Slicing

Vinay Saini

Inventor | Author | Technologist | Mentor *** Principal Architect: A Visionary Leader Driving Excellence and Positive Impact in Enterprise and Industrial Networks (MBA, B.Tech, CWNE #69, CCIE #38448, CCDE #20240032)

Published Jun 24, 2024

With AI/ML use cases increasing, GPUs are in huge demand. Everyone is talking about AI/ML. I was reading a funny joke that says, if you do a single pushup every time you hear the term AI/ML you can make a nice body without going to GYM. Jokes apart the reality is that everyone is looking at the AI/ML, especially GenAI use cases. Many organisations are including GPU-based servers in their environment. Typically these GPUs are hosted in computer servers like Cisco UCS. You will install hypervisors and virtual machines on top of these bare metal servers. Different VM workloads will have different needs and may not require the entire GPU for itself. That is where the technique of GPU slicing comes into the picture. it allows you to partition a single GPU into multiple GPUs, allowing multiple VMs to access the GPU. Let's dig deeper into this concept.

GPU slicing is a technique that allows multiple virtual machines (VMs) or containers to share a single physical GPU (Graphics Processing Unit). In simple words, GPU slicing means dividing the GPU resources into smaller, isolated units that can be allocated to different tasks or users at the same time.

This is very useful in cloud computing environments, where multiple users or applications need to use GPU power but cannot each have a dedicated GPU. It works by using software to create virtual GPUs (vGPUs) from a single physical GPU. Each vGPU acts like a separate GPU but is actually a part of the physical GPU. This is done using special software like NVIDIA vGPU or AMD MxGPU. The software divides the GPU memory and processing power into smaller chunks, and each chunk is assigned to a different VM or container. This way, many users can run their applications on the same GPU without interfering with each other.

Lets look at the GPU slicing modes offered by NVIDIA

NVIDIA offers several specific GPU slicing modes designed to optimize resource allocation and performance for various use cases. These modes are mainly provided through NVIDIA's vGPU (virtual GPU) technology.

Time-Sliced (vGPU) Mode: In Time-Sliced mode, the physical GPU is divided into multiple vGPUs, each allocated to a different virtual machine (VM). The GPU's resources are time-shared among the VMs, meaning each vGPU gets exclusive access to the GPU for a short period before switching to the next vGPU. This mode is beneficial for applications that require high interactivity and responsiveness, such as virtual desktops and real-time rendering like VDIs, Real time 3D applications.
Multi-Instance GPU (MIG): The MIG mode, introduced with NVIDIA's A100 GPUs, allows the physical GPU to be divided into multiple instances, each acting as a separate, fully isolated GPU. Each instance has its own dedicated memory and compute resources, providing consistent performance without interference from other instances. This mode is ideal for workloads requiring deterministic performance, such as AI inference and high-performance computing (HPC). example - AI/Ml Inferencing tasks or HPC workloads requiring consistent performance.

Streamline GPU usage with Advanced Slicing

Vinay Saini

Inventor | Author | Technologist | Mentor *** Principal Architect: A Visionary Leader Driving Excellence and Positive Impact in Enterprise and Industrial Networks (MBA, B.Tech, CWNE #69, CCIE #38448, CCDE #20240032)

Recommended by LinkedIn

More articles by this author

Insights from the community

Others also viewed

DDN Expands Support for NVIDIA Technology to Enable AI Application Acceleration for Data Center Infrastructure

This Week : IndiaAI Makes 5,000 Crore NVIDIA GPU Move

GKE was busy shipping features

NVIDIA H100 vs. H200: What is the Difference and Which Should You Buy?

All the Non-NVIDIA GPUs, Please Stand Up

Choosing the Right GPU: A Comparative Analysis!

GPU Clusters: Powering the Future of High-Performance Computing

Still Confused About the NVidia Roadmap? You are not alone....

LLM Inference: Hardware Solutions Under the Spotlight, including Nvidia, Intel, and the Rise of AMD

Explore topics

Recommended by LinkedIn

The Industrial Metaverse: A Technological Revolution for Industry

Nov 29, 2024

Gigafactory - Sub-Network Design Approach

Nov 17, 2024

From Dream to Data: How Close Are Dreams to AI Creation?

Aug 12, 2024

GPU-Defined Data Centers: Revolutionizing the Digital Infrastructure

Jun 12, 2024

DevNetOps in Modern Networking: With and Without the Controller Layer

Jun 10, 2024

Retrieval-Augmented Generation (RAG) for Computer Networks

Mar 11, 2024

Navigating the Quantum Frontier: Customer Demand Spurs Service Providers to Offer Quantum-Safe Data Transport Options

Jan 20, 2024

Don't miss noise floor calculation in your WiFi surveys.

Dec 4, 2023

What is a Gigafactory and How to design a network for it?

Nov 23, 2023

Sustainable Application Development

Oct 10, 2023