SGD with momentum : How is it different with SGD ?

SGD with momentum featured image

SGD with momentum is an optimizer that minimizes the impact of noises in convergence to the optimal weights. Stochastic Gradient Descent momentum just helps to reduce the convergence time.  Adding momentum in SGD overcome the major shortcomings of SGD over Batch Gradient Descent without losing its advantage.


SGD without momentum :

In order to understand the SGD with Momentum, We will first see the SGD without momentum.


In ordinary SGD we have the below formula.



The above noise and random fluctuations are because of small batches or single data points.  This increases the convergence time.


SGD with momentum –

The objective of the momentum is to give a more stable direction to the convergence optimizer. Hence we will add an exponential moving average in the SGD weight update formula.


weight update with momentum
weight update with momentum


Here we have added the momentum factor. Now let’s see how this momentum component calculated.




This is just an exponential moving average of loss derivative with respect to weights. This will stabilize the converging function. See here any moving average helps to add the component of the previous data point on the current data point. Hence if any noise data come, It will not be so much influencing. Now let’s see the convergence diagram.


Here we can see, The momentum factor is reducing the fluctuations in the weight updates.

Conclusion –

Stochastic Gradient Descent and mini-batch gradient descent is more suitable than Batch gradient descent in real scenarios. But just because of the noise and local minima problem they take more time in convergence in real scenarios. But with momentum, these optimizers becomes more efficient. I have tried to keep this article lean and informative. But if you have any doubt related to any of the points we discussed, please comment below. We will surely revert you on that.


Data Science Learner Team

Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

Thank you for signup. A Confirmation Email has been sent to your Email Address.

Something went wrong.

Meet Abhishek ( Chief Editor) , a data scientist with major expertise in NLP and Text Analytics. He has worked on various projects involving text data and have been able to achieve great results. He is currently manages, where he and his team share knowledge and help others learn more about data science.
Thank you For sharing.We appreciate your support. Don't Forget to LIKE and FOLLOW our SITE to keep UPDATED with Data Science Learner