RuntimeError: CUDA error: device-side assert triggered error occurs mainly because of the incorrect shape of the output layer with the actual label used in the dataset or the input we provide to the activation function. There are so many prechecks that are also important to avoid this error like CUDA Version compatibility with PyTorch and Hardware installed. Anyways in this article, we will explore this error in more detail from a solution point of view.
This is an assertion error that is triggered mainly when there is some problem that is prohibited for normal flow. We request you to follow the solution in order.
When I say Incompatibilty of CUDA, It is very broad area. Here we need to verify the following points-
If you find any gap in any of the above points then firstly correct all of them. If still you are getting the same error then go for the next solution.
Apart from the above points if there is any other issue related to CUDA toolkit then we should set CUDA_LAUNCH_BLOCKING=1 in the environment variable. We should run the code after setting this up. Using this environment variable, it will help PyTorch to find out the INCOMPATIBILITY issue with CUDA.
You should also upgrade the PyTorch Version and best way to upgrade the Torch module is by using the pip package manager. You can run the below code.
pip install torch
Mostly this error occurs when either :
For better debugging, you can use module torch.autograd.profiler for better debugging in the code. Now Let’s explore the most vulnerable area of the code in General.
Input Shape to Final Layer Mismatches –
import torch.nn as tnn
clf = tnn.Sequential(
tnn.Linear(2048, 800),
tnn.ReLU(),
tnn.Dropout(p=0.2),
tnn.Linear(800, 300),
tnn.ReLU(),
tnn.Dropout(p=0.2),
tnn.Linear(300, 204),
tnn.LogSoftmax(dim=1)
)
Here in this code, we have used (300, 204) shape but suppose in the data set we have 202 classes then it will through RuntimeError: CUDA error: device-side assert triggered the error. To simply fix this error we will correct the shape.
Incorrect Activation Function Used –
In most of the cases, we use the activation function which accepts input in a particular magnitude range but our output does not lie on the same and it triggers the same error. Use BCEWITHLOGITSLOSS(), Sigmoid Activation function if required.
Thanks,
Data Science Learner Team