Gradient Descent is an optimization algorithm that minimizes any function. Basically, it gives the optimal values for the coefficient in any function which minimizes the function. In machine learning and deep learning, everything depends on the weights of the neurons which minimizes the cost function. If the cost function will be low, the Model will be a better fit on the datasets.
Suppose we have a neural network with a learning rate η. Here is the formula for weight updation using a gradient descent optimizer. Here L is the loss function.
In general, while performing a single epoch/backpropagation, We can consider single training dataset observation or multiple at a time.
There are many variations of Gradient Descent Algorithm.
1.Batch gradient descent
2.Stochastic gradient descent
In this variation of gradient descent, We consider the losses of the complete training set at a single iteration/backpropagation/epoch. Which is the cost function for the neural network. This is standard gradient descent. It has some advantages and disadvantages.
1.This is computationally efficient because all training set goes in one go. Hence only a few machine cycles are required.
2. Fewer oscillations process and easy convergence to global minima.
1.It is less prone to local minima but in case it tends to local minima. It has no noisy step hence it will not be able to come out of it.
2. Although it is computationally efficient but not fast. As it needs more memory to load the complete data into memory at once.
This gradient descent algorithm variation is the same as earlier with a difference that it considers a single training observation at single epoch/backpropagation/iteration. Hence it is better
It is computationally very fast. It is also memory efficient because it considers one observation at a time from the complete dataset.
It does not converge straight because of the noise.
Thanks
Data Science Learner Team