Vanishing Gradient Problem in RNN: Brief Overview

Vanishing Gradient Problem in RNN

vanishing gradient problem in rnn occurs when the derivate of the loss function with respect to the weight parameter becomes very small. This leads to a weight change of almost zero in initial layers of neural networks. Once the weight of layers will not update. The  Loss function will not optimize. optimizer function will not converge at a global minimum. This overall situation is a vanishing gradient problem in RNN.

 

Why Activation Function is main factor vanishing gradient problem in RNN?

As we all know that sigmoid function derivative ranges between (0,0.25). See, The backpropagation follows the chain rule, Where we multiply derivative of layers each other. This multiplication will be deep in going towards the initial layer. It will tend to zero. This cause the vanishing gradient problem in Recurrent Neural Network.

Derivative of Sigmoid function
The derivative of the Sigmoid function

 

Simplification ( Numerical Approach)-

Suppose at 5th Layer of Back Propagation. As per the chain rule, We are multiplying the derivative 5 times ( Any value between 0 to 0.25)

0.21 × 0.11 × 0.06 × 0.11 × 0.15 = 0.000022869  ( Assuming any value between 0 to 0.25 for derivative )

This is almost negligible at 5 levels of the layer. Suppose If we have so many deep layers, It will be close to zero.

 

Is RELU a solution for the vanishing gradient problem?

Technically It solves the problem of Vanishing Gradient problem because Its derivative becomes 1. Hence it will make weight change negligible in the initial layer. But It will create the problem of exploding gradient problem.

derivative of RELU Activation function
a derivative of RELU Activation function

Actually, exploding Gradient occurs because if the derivative will be 1 always. In the Back Propogation step, It will affect a larger weight change in each layer at very epochs. Hence the optimizer will not converge easily at Global Minima.

 

vanishing gradient problem in rnn : Solution –

  1. Weight initialization in such a way that It will reduce the chances of vanishing gradient problem.
  2. Using LSTM variant of RNN ( Recurrent neural Network)

 

Conclusion –

Well, LSTM exists to overcome the problem of vanishing gradient problem. I hope this article must have cleared over the topic. Please comment below!

 

Thanks 

Data Science Learner Team

Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

Thank you for signup. A Confirmation Email has been sent to your Email Address.

Something went wrong.

Meet Abhishek ( Chief Editor) , a data scientist with major expertise in NLP and Text Analytics. He has worked on various projects involving text data and have been able to achieve great results. He is currently manages Datasciencelearner.com, where he and his team share knowledge and help others learn more about data science.
 
Thank you For sharing.We appreciate your support. Don't Forget to LIKE and FOLLOW our SITE to keep UPDATED with Data Science Learner