RuntimeError: CUDA error: device-side assert triggered error occurs mainly because of the incorrect shape of the output layer with the actual label used in the dataset or the input we provide to the activation function. There are so many prechecks that are also important to avoid this error like CUDA Version compatibility with PyTorch and Hardware installed. Anyways in this article, we will explore this error in more detail from a solution point of view.
RuntimeError: CUDA error: device-side assert triggered ( Solution ) –
This is an assertion error that is triggered mainly when there is some problem that is prohibited for normal flow. We request you to follow the solution in order.
Solution 1: Check the Incompatible of CUDA –
When I say Incompatibilty of CUDA, It is very broad area. Here we need to verify the following points-
- Verify the CUDA Version and check the compatibility with the PyTorch Version.
- Check the GPU status ( Hardware ) and the driver installed If the driver is not supportive then upgrade the driver status.
- Verify the configuration for Memory Allocation for CUDA.
If you find any gap in any of the above points then firstly correct all of them. If still you are getting the same error then go for the next solution.
Solution 2: Set CUDA_LAUNCH_BLOCKING=1 –
Apart from the above points if there is any other issue related to CUDA toolkit then we should set CUDA_LAUNCH_BLOCKING=1 in the environment variable. We should run the code after setting this up. Using this environment variable, it will help PyTorch to find out the INCOMPATIBILITY issue with CUDA.
Solution 3: Upgrade the PyTorch –
You should also upgrade the PyTorch Version and best way to upgrade the Torch module is by using the pip package manager. You can run the below code.
pip install torch
Solution 4 : Fixing the code level ( Shapes and Activation function ) –
Mostly this error occurs when either :
- Input Shape to Final Layer Mismatches
- Incorrect Activation Function Used.
For better debugging, you can use module torch.autograd.profiler for better debugging in the code. Now Let’s explore the most vulnerable area of the code in General.
Input Shape to Final Layer Mismatches –
import torch.nn as tnn clf = tnn.Sequential( tnn.Linear(2048, 800), tnn.ReLU(), tnn.Dropout(p=0.2), tnn.Linear(800, 300), tnn.ReLU(), tnn.Dropout(p=0.2), tnn.Linear(300, 204), tnn.LogSoftmax(dim=1) )
Here in this code, we have used (300, 204) shape but suppose in the data set we have 202 classes then it will through RuntimeError: CUDA error: device-side assert triggered the error. To simply fix this error we will correct the shape.
Incorrect Activation Function Used –
In most of the cases, we use the activation function which accepts input in a particular magnitude range but our output does not lie on the same and it triggers the same error. Use BCEWITHLOGITSLOSS(), Sigmoid Activation function if required.
Read More :
Data Science Learner Team
Join our list
Subscribe to our mailing list and get interesting stuff and updates to your email inbox.