Runtimeerror: distributed package doesnt have nccl built in (Step By Step)

Runtimeerror_ distributed package doesnt have nccl built in featured image

Runtimeerror: distributed package doesnt have nccl built in errors mainly if PyTorch Version is not compatible with nccl libraries ( NVIDIA Collective Communication Library ).  Actually, in many cases, it happens we install PyTorch CPU Version in place of GPU supportive version. After this when we try to run GPU-friendly distributed code on the CPU-based PyTorch Library we get this type of error. Apart from that, there can be more few more possible reasons for this error. We will discuss all the root causes in detail with their solution in this article. So let’s start.

 

Runtimeerror: distributed package doesnt have nccl built in ( Steps) –

Please follow the steps in order to save your time while fixing the error.

Step1: Install GPU Compatible

PyTorch –

As I told you most of the developers and data scientists install PyTorch CPU-based library and try to run the distributed code on it. Here is the command you can try to install PyTorch GPU.

For CUDA 10.2 –

Make sure to use the Nvidia Driver version must be greater or equal to version 441.22

pip3 install torch==1.8.1+cu102 torchvision==0.9.1+cu102 torchaudio===0.8.1 -f https://download.pytorch.org/whl/torch_stable.html

For CUDA 11.1 –

Make sure to use the Nvidia Driver version must be greater or equal to version 456.38

pip3 install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio===0.8.1 -f https://download.pytorch.org/whl/torch_stable.html

After installation, you can verify the PyTorch Version using the below command.

pip3 show torch

Step2: Reinstall NCCL –

In case you installed NCCL prior but it somehow became incompatible or not working properly. Then the best solution is to reinstall the NCCL package again. Here is the link to download the NCCL package. The NCCL package really accelerates GPU communication very fast. In distributed computing, we require multimode GPU, and for this NCCL is required.

NCCL Download to fix Runtimeerror distributed package doesnt have nccl built in
NCCL Download to fix Runtimeerror distributed package doesnt have nccl built in

Step 3: Update the environment variable –

Please set the environment :

export NCCL_P2P_DISABLE=1  # Disable P2P communication if necessary
export NCCL_DEBUG=INFO

Step 4: Check GPU Availability –

We are setting up all these parameters to ensure that we can leverage multiple GPUs for processing. But sometimes either because of Hardware failure OS incompatibility or any other reason, we end up with no or One GPU. In that case, we should fix this up. The easiest way to check GPU availability.  Please use the below code.

torch.cuda.device_count()

 

Thanks
Data Science Learner Team

Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

Thank you for signup. A Confirmation Email has been sent to your Email Address.

Something went wrong.

Meet Abhishek ( Chief Editor) , a data scientist with major expertise in NLP and Text Analytics. He has worked on various projects involving text data and have been able to achieve great results. He is currently manages Datasciencelearner.com, where he and his team share knowledge and help others learn more about data science.
 
Thank you For sharing.We appreciate your support. Don't Forget to LIKE and FOLLOW our SITE to keep UPDATED with Data Science Learner