Raman SHRIVASTAVA’s Post

View profile for Raman SHRIVASTAVA, graphic

AI Expert & Leader | LLMs, RAGs, AI Agents | 40 under 40 Data Scientist, AIM 2019

Training on multiple GPUs using NCCL and PyTorch

View profile for Aniket Mishrikotkar, graphic

Machine Learning Engineer @ MathCo | MLOps | LLMs

Distributed training on multiple GPUs using NCCL and PyTorch NCCL is the standard communication backend for NVIDIA GPUs. we use NCCL for executing operations like all-reduce. NCCL works on a single or multiple machines and can use high performance networks as well. 1️⃣ similar to training on multiple CPUs, to train on multiple GPUs we need to initialize communication groups with `nccl` as the backend. `dist.init_process_group(backend="nccl")` 2️⃣ we need to make sure that each process is allocated to one GPU. to do this we can use `RANK` and assign it to `device` variable. 3️⃣ we can then use `torchrun` to launch distributed training. this way we can easily make our CUDA based programs run on multiple GPUs. #pytorch #deeplearning #distributedsystems

  • No alternative text description for this image

To view or add a comment, sign in

Explore topics